Bug 1922108 - OCS 4.7 4.7.0-242.ci and beyond: osd pods are not created
Summary: OCS 4.7 4.7.0-242.ci and beyond: osd pods are not created
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: rook
Version: 4.7
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: OCS 4.7.0
Assignee: Sébastien Han
QA Contact: Vijay Avuthu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-29 09:23 UTC by Vijay Avuthu
Modified: 2021-05-19 09:19 UTC (History)
6 users (show)

Fixed In Version: 4.7.0-714.ci
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-19 09:18:35 UTC
Embargoed:


Attachments (Terms of Use)
rook-ceph-operator log (245.20 KB, text/plain)
2021-01-29 09:38 UTC, Vijay Avuthu
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github rook rook pull 7111 0 None closed ceph: do not enable ShareProcessNamespace if HostIPC 2021-02-14 13:46:26 UTC
Github rook rook pull 7116 0 None closed ceph: do not enable ShareProcessNamespace if HostPID 2021-02-14 13:46:26 UTC
Red Hat Product Errata RHSA-2021:2041 0 None None None 2021-05-19 09:19:02 UTC

Description Vijay Avuthu 2021-01-29 09:23:41 UTC
Description of problem (please be detailed as possible and provide log
snippests):

osd pods are not created in latest ocs version ( 4.7.0-242.ci )

Version of all relevant components (if applicable):

openshift installer (4.7.0-0.nightly-2021-01-29-024753)
ocs-registry:4.7.0-242.ci


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes

Is there any workaround available to the best of your knowledge?
NA

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1/1

Can this issue reproducible?
NA

Can this issue reproduce from the UI?
Not tried

If this is a regression, please provide more details to justify this:
Yes

Steps to Reproduce:
1. Install OCS using ocs-ci
2. installation failed due to osd pods are not created
3.


Actual results:
osd pods are not created

Expected results:
osd pods should be running


Additional info:

job: https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/qe-deploy-ocs-cluster/16815/console

must gather logs: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/vavuthua-lso/vavuthua-lso_20210112T143119/logs/failed_testcase_ocs_logs_1611905038/deployment_ocs_logs/

Comment 2 Vijay Avuthu 2021-01-29 09:25:36 UTC
> pods list

$ oc get pods
NAME                                                              READY   STATUS      RESTARTS   AGE
csi-cephfsplugin-fcsjh                                            3/3     Running     0          71m
csi-cephfsplugin-g84rg                                            3/3     Running     0          71m
csi-cephfsplugin-provisioner-6fdfcfdc9c-4fjjh                     6/6     Running     0          71m
csi-cephfsplugin-provisioner-6fdfcfdc9c-826ks                     6/6     Running     0          71m
csi-cephfsplugin-t7hxr                                            3/3     Running     0          71m
csi-rbdplugin-f5h5h                                               3/3     Running     0          71m
csi-rbdplugin-g5rx7                                               3/3     Running     0          71m
csi-rbdplugin-gjvrn                                               3/3     Running     0          71m
csi-rbdplugin-provisioner-69596c4bf8-7dwmz                        6/6     Running     0          71m
csi-rbdplugin-provisioner-69596c4bf8-mp97b                        6/6     Running     0          71m
must-gather-2x2w4-helper                                          1/1     Running     0          28m
must-gather-l8g9k-helper                                          1/1     Running     0          53m
noobaa-core-0                                                     1/1     Running     0          68m
noobaa-db-pg-0                                                    0/1     Pending     0          68m
noobaa-operator-6f4db5955b-w6n65                                  1/1     Running     0          71m
ocs-metrics-exporter-5b8cb75c94-btcvh                             1/1     Running     0          71m
ocs-operator-6cb6dff494-4gzc9                                     1/1     Running     0          71m
rook-ceph-crashcollector-compute-0-7fdccc8889-fbfg7               1/1     Running     0          69m
rook-ceph-crashcollector-compute-1-f5d447b76-z5qkn                1/1     Running     0          70m
rook-ceph-crashcollector-compute-2-868fcfc97f-455dx               1/1     Running     0          69m
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-76d9b79c5tw6p   2/2     Running     0          67m
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-76f7d6b67bdjm   2/2     Running     0          67m
rook-ceph-mgr-a-55cc97b766-jhv9l                                  2/2     Running     0          69m
rook-ceph-mon-a-69d864d7f8-7r4kv                                  2/2     Running     0          70m
rook-ceph-mon-b-5db5d5b599-qgqjj                                  2/2     Running     0          69m
rook-ceph-mon-c-7d7bc4cb9d-kzr5x                                  2/2     Running     0          69m
rook-ceph-operator-5fd7d877f-tc2qq                                1/1     Running     0          71m
rook-ceph-osd-prepare-ocs-deviceset-0-data-0-zbf6n-mvh49          0/1     Completed   0          69m
rook-ceph-osd-prepare-ocs-deviceset-1-data-0-zql2z-8q9hx          0/1     Completed   0          69m
rook-ceph-osd-prepare-ocs-deviceset-2-data-0-f2rf5-4d9gf          0/1     Completed   0          69m
$

Comment 3 Vijay Avuthu 2021-01-29 09:37:04 UTC
> pvc

$ oc get pvc
NAME                           STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  AGE
db-noobaa-db-pg-0              Pending                                                                        ocs-storagecluster-ceph-rbd   77m
ocs-deviceset-0-data-0-zbf6n   Bound     pvc-771f1c90-446e-48a0-9d99-5addbdf559b4   100Gi      RWO            thin                          78m
ocs-deviceset-1-data-0-zql2z   Bound     pvc-36d044e8-82e4-418e-bf62-a97fb28674c5   100Gi      RWO            thin                          78m
ocs-deviceset-2-data-0-f2rf5   Bound     pvc-a5d8ded1-c88d-4f8b-80c9-7014074878e1   100Gi      RWO            thin                          78m
rook-ceph-mon-a                Bound     pvc-29933f95-a7f8-4d17-bb84-8b6e47b653b5   10Gi       RWO            thin                          80m
rook-ceph-mon-b                Bound     pvc-e4bc6993-0c00-488d-9a1a-9bcbb5cccaf7   10Gi       RWO            thin                          80m
rook-ceph-mon-c                Bound     pvc-3b480e9b-26c6-4df1-a10d-0ec2fe0a7d10   10Gi       RWO            thin                          80m

> from the rook-ceph-operator-5fd7d877f-tc2qq log, it looks like rook-ceph-osd-* is invalid

2021-01-29 08:16:09.112448 W | op-osd: failed to create osd deployment for pvc "ocs-deviceset-1-data-0-zql2z", osd {0 ceph 9491a35d-8bb6-413d-a06b-c192a50e4d6a  /mnt/ocs-deviceset-1-data-0-zql2z   true root=default host=ocs-deviceset-1-data-0-zql2z rack=rack0 false raw bluestore}. Deployment.apps "rook-ceph-osd-0" is invalid: spec.template.spec.securityContext.shareProcessNamespace: Invalid value: true: ShareProcessNamespace and HostPID cannot both be enabled
2021-01-29 08:16:18.403045 I | op-osd: osd orchestration status for node ocs-deviceset-2-data-0-f2rf5 is computingDiff
2021-01-29 08:16:18.423835 I | op-osd: osd orchestration status for node ocs-deviceset-2-data-0-f2rf5 is orchestrating
2021-01-29 08:16:19.219964 I | op-osd: osd orchestration status for node ocs-deviceset-0-data-0-zbf6n is computingDiff
2021-01-29 08:16:19.243202 I | op-osd: osd orchestration status for node ocs-deviceset-0-data-0-zbf6n is orchestrating
2021-01-29 08:16:22.635392 I | op-osd: osd orchestration status for node ocs-deviceset-2-data-0-f2rf5 is completed
2021-01-29 08:16:22.635406 I | op-osd: starting 1 osd daemons on pvc ocs-deviceset-2-data-0-f2rf5
2021-01-29 08:16:22.635411 I | op-osd: OSD will have its main bluestore block on "ocs-deviceset-2-data-0-f2rf5"
2021-01-29 08:16:22.635416 I | cephclient: getting or creating ceph auth key "osd.1"
2021-01-29 08:16:22.957492 W | op-osd: failed to create osd deployment for pvc "ocs-deviceset-2-data-0-f2rf5", osd {1 ceph b628e29d-abc1-4dc0-90b6-9e91dab2af8c  /mnt/ocs-deviceset-2-data-0-f2rf5   true root=default host=ocs-deviceset-2-data-0-f2rf5 rack=rack1 false raw bluestore}. Deployment.apps "rook-ceph-osd-1" is invalid: spec.template.spec.securityContext.shareProcessNamespace: Invalid value: true: ShareProcessNamespace and HostPID cannot both be enabled
2021-01-29 08:16:23.431758 I | op-osd: osd orchestration status for node ocs-deviceset-0-data-0-zbf6n is completed
2021-01-29 08:16:23.431775 I | op-osd: starting 1 osd daemons on pvc ocs-deviceset-0-data-0-zbf6n
2021-01-29 08:16:23.431780 I | op-osd: OSD will have its main bluestore block on "ocs-deviceset-0-data-0-zbf6n"
2021-01-29 08:16:23.431785 I | cephclient: getting or creating ceph auth key "osd.2"
2021-01-29 08:16:23.742978 W | op-osd: failed to create osd deployment for pvc "ocs-deviceset-0-data-0-zbf6n", osd {2 ceph 0632f858-89fe-4fc6-ab92-9074d5ef1fd5  /mnt/ocs-deviceset-0-data-0-zbf6n   true root=default host=ocs-deviceset-0-data-0-zbf6n rack=rack2 false raw bluestore}. Deployment.apps "rook-ceph-osd-2" is invalid: spec.template.spec.securityContext.shareProcessNamespace: Invalid value: true: ShareProcessNamespace and HostPID cannot both be enabled
2021-01-29 08:16:23.748060 I | op-osd: 3/3 node(s) completed osd provisioning
2021-01-29 08:16:23.748091 I | op-osd: start provisioning the osds on nodes, if needed
2021-01-29 08:16:23.752693 I | op-osd: 0 of the 0 storage nodes are valid
2021-01-29 08:16:23.752752 W | op-osd: no valid nodes available to run osds on nodes in namespace openshift-storage
2021-01-29 08:16:23.752760 I | op-osd: start osds after provisioning is completed, if needed
2021-01-29 08:16:24.048234 I | op-osd: completed running osds in namespace openshift-storage

Comment 4 Vijay Avuthu 2021-01-29 09:38:10 UTC
Created attachment 1751984 [details]
rook-ceph-operator log

Comment 6 Jose A. Rivera 2021-01-29 15:42:51 UTC
ocs-operator does not directly influence the OSD Deployments, those are created and managed by Rook-Ceph. As such, moving this BZ to the relevant component.

Comment 7 Travis Nielsen 2021-01-29 17:30:01 UTC
This looks related to enabling the log collector.
@Seb can you take a look?

Comment 10 Travis Nielsen 2021-02-01 23:26:19 UTC
Merged downstream with https://github.com/openshift/rook/pull/157

Comment 12 Vijay Avuthu 2021-02-03 12:42:07 UTC
Verified deployment job here: https://ocs4-jenkins-csb-ocsqe.apps.ocp4.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/167/console

openshift installer (4.7.0-0.nightly-2021-02-02-223803)
ocs-operator.v4.7.0-249.ci

> All the osd pods are created.
10:59:17 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc -n openshift-storage get Pod  -n openshift-storage --selector=app=rook-ceph-osd -o yaml
10:59:17 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc -n openshift-storage get Pod rook-ceph-osd-0-6bf5df9459-hzgpt -n openshift-storage
10:59:18 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc -n openshift-storage get Pod rook-ceph-osd-1-77749bb57d-df8tb -n openshift-storage
10:59:18 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc -n openshift-storage get Pod rook-ceph-osd-2-75ccb4599f-r4snx -n openshift-storage
10:59:18 - MainThread - ocs_ci.ocs.ocp - INFO - 3 resources already reached condition!

> job failed due to bug bug 1924211

Marking as Verified

Comment 15 errata-xmlrpc 2021-05-19 09:18:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2041


Note You need to log in before you can comment on or make changes to this bug.