Bug 1879072 - Deployment with encryption at rest is failing to bring up OSD pods
Summary: Deployment with encryption at rest is failing to bring up OSD pods
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: rook
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: OCS 4.6.0
Assignee: Sébastien Han
QA Contact: Petr Balogh
URL:
Whiteboard:
Depends On:
Blocks: 1883927
TreeView+ depends on / blocked
 
Reported: 2020-09-15 11:50 UTC by Petr Balogh
Modified: 2020-12-17 06:24 UTC (History)
8 users (show)

Fixed In Version: 4.6.0-102.ci
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1883927 (view as bug list)
Environment:
Last Closed: 2020-12-17 06:24:14 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2020:5605 0 None None None 2020-12-17 06:24:35 UTC

Comment 4 Sébastien Han 2020-09-15 14:28:50 UTC
Yes, I'm looking. Thanks.

Comment 5 Petr Balogh 2020-09-15 15:14:00 UTC
Copying must gather I've collected locally here: http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bz-1879072-logs.tar.gz

As I see that the logs from the job wasn't collected well

Comment 6 Sébastien Han 2020-09-16 09:36:33 UTC
Petr, the reason why the deployment failed is because the Ceph container image does not have the encryption code:

[root@13ceeafe1c5f /]# ceph -v
ceph version 14.2.8-91.el8cp (75b4845da7d469665bd48d1a49badcc3677bf5cd) nautilus (stable)

ceph-volume is missing the encryption flag.

The downstream backport was done as per https://bugzilla.redhat.com/show_bug.cgi?id=1845622#c9
Are we using RHCS 4.1z2 rc build?

Comment 7 Petr Balogh 2020-09-16 10:09:19 UTC
Hey Sebastien.

We've discussed this yesterday on program call and in this case we need to get OCS 4.6 DS build which will have new image.

I think it's now on Christina's team so moving this to build component now.

Comment 8 Mudit Agarwal 2020-09-30 06:18:20 UTC
This can be moved to ON_QA now with the latest 4.6 build.

Comment 11 Petr Balogh 2020-09-30 13:32:14 UTC
Failed QE.

OCP Version:
4.6.0-0.nightly-2020-09-30-052433

Tried with build:
4.6.0-102.ci which suppose to have proper RHCS image:
The build should have RHCS 4-33 - confirmed from Boris.

Logs:
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jnk-pr2696-b2972/jnk-pr2696-b2972_20200930T112803/logs/

Job:
https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/qe-deploy-ocs-cluster/12913/


OSD Pods:
rook-ceph-osd-0-557f59b57d-9tswd                                  0/1     Init:CrashLoopBackOff   6          7m52s
rook-ceph-osd-1-6cc7896876-xjlwk                                  0/1     Init:CrashLoopBackOff   6          7m52s
rook-ceph-osd-2-7d5577c5df-rmfmb                                  0/1     Init:CrashLoopBackOff   6          7m53s

Describe on one of OSD pod:
Type     Reason                 Age                 From                Message
  ----     ------                 ----                ----                -------
  Normal   Scheduled              <unknown>                               Successfully assigned openshift-storage/rook-ceph-osd-0-557f59b57d-9tswd to compute-2
  Normal   SuccessfulMountVolume  11m                 kubelet, compute-2  MapVolume.MapPodDevice succeeded for volume "pvc-0fb386f8-b2a4-46b4-bbac-16300e41d740" globalMapPath "/var/lib/kubelet/plugins/kubernetes.io/vsphere-volume/volumeDevices/[vsanDatastore] 3c242d5f-6405-9967-713e-e4434bd7dee0/jnk-pr2696-b2972-rg9sv-dynamic-pvc-0fb386f8-b2a4-46b4-bbac-16300e41d740.vmdk"
  Normal   SuccessfulMountVolume  11m                 kubelet, compute-2  MapVolume.MapPodDevice succeeded for volume "pvc-0fb386f8-b2a4-46b4-bbac-16300e41d740" volumeMapPath "/var/lib/kubelet/pods/32c15db1-cc03-41e4-adfb-44bac2fd587b/volumeDevices/kubernetes.io~vsphere-volume"
  Normal   Started                11m                 kubelet, compute-2  Started container encryption-open
  Normal   AddedInterface         11m                 multus              Add eth0 [10.128.2.25/23]
  Normal   Pulled                 11m                 kubelet, compute-2  Container image "quay.io/rhceph-dev/rhceph@sha256:22ea8ee38cd8283f636c2eeb640eb4a1bb744efb18abee114517926f4a03bff9" already present on machine
  Normal   Created                11m                 kubelet, compute-2  Created container encryption-open
  Normal   Started                11m                 kubelet, compute-2  Started container blkdevmapper-encryption
  Normal   Pulled                 11m                 kubelet, compute-2  Container image "quay.io/rhceph-dev/rhceph@sha256:22ea8ee38cd8283f636c2eeb640eb4a1bb744efb18abee114517926f4a03bff9" already present on machine
  Normal   Created                11m                 kubelet, compute-2  Created container blkdevmapper-encryption
  Normal   Pulled                 11m                 kubelet, compute-2  Container image "quay.io/rhceph-dev/rhceph@sha256:22ea8ee38cd8283f636c2eeb640eb4a1bb744efb18abee114517926f4a03bff9" already present on machine
  Normal   Created                11m                 kubelet, compute-2  Created container encrypted-block-status
  Normal   Started                11m                 kubelet, compute-2  Started container encrypted-block-status
  Normal   Started                11m (x3 over 11m)   kubelet, compute-2  Started container expand-encrypted-bluefs
  Normal   Pulled                 11m (x4 over 11m)   kubelet, compute-2  Container image "quay.io/rhceph-dev/rhceph@sha256:22ea8ee38cd8283f636c2eeb640eb4a1bb744efb18abee114517926f4a03bff9" already present on machine
  Normal   Created                11m (x4 over 11m)   kubelet, compute-2  Created container expand-encrypted-bluefs
  Warning  BackOff                95s (x46 over 11m)  kubelet, compute-2  Back-off restarting failed container


Kubeconfig shared with Sebastien:
 http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jnk-pr2696-b2972/jnk-pr2696-b2972_20200930T112803/openshift-cluster-dir/auth/kubeconfig

Comment 12 Mudit Agarwal 2020-09-30 14:00:29 UTC
If the build has encryption enabled then the the component needs to be changed to rook.

Comment 13 Petr Balogh 2020-09-30 14:11:29 UTC
Actually Sebastien told me to create the clone of this to rook component so I did here:
https://bugzilla.redhat.com/show_bug.cgi?id=1883927

So we can close this one now as the build should contain the proper RHCS image.

Comment 16 errata-xmlrpc 2020-12-17 06:24:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.6.0 security, bug fix, enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5605


Note You need to log in before you can comment on or make changes to this bug.