1893626 – Failed to encrypt OSDs on OCS4.6 installation (via UI)

Bug 1893626 - Failed to encrypt OSDs on OCS4.6 installation (via UI)

Summary: Failed to encrypt OSDs on OCS4.6 installation (via UI)

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Console Storage Plugin
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.6.z
Assignee:	Bipul Adhikari
QA Contact:	Oded
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1903413 (view as bug list)
Depends On:	1894210
Blocks:
TreeView+	depends on / blocked

Reported:	2020-11-02 08:23 UTC by Oded
Modified:	2020-12-02 05:53 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1894210 (view as bug list)
Environment:
Last Closed:	2020-11-16 14:37:43 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
deploy olm file (758 bytes, text/plain) 2020-11-02 08:23 UTC, Oded	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift console pull 7089	0	None	closed	[release-4.6] Bug 1893626: Fix Encryption request for OCS	2021-02-04 06:21:25 UTC
Red Hat Product Errata	RHBA-2020:4987	0	None	None	None	2020-11-16 14:38:00 UTC

Description Oded 2020-11-02 08:23:10 UTC

Created attachment 1725706 [details]
deploy olm file

Description of problem (please be detailed as possible and provide log
snippests):
Failed to encrypt OSDs on OCS4.6 installation (via UI)

Version of all relevant components (if applicable):
Provider: Vmware
OCP Version: 4.6.0-0.nightly-2020-10-31-214252

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?
yes

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

Can this issue reproducible?
yes

Can this issue reproduce from the UI?
yes

If this is a regression, please provide more details to justify this:


Steps to Reproduce:

1.Verify OCP status:[PASS]

2.Deploy OLM(On OCP4.6, OCS Operator won't be shown in the operator hub without this command)
image: quay.io/rhceph-dev/ocs-registry:4.6.0-149.ci
$ oc create -f deploy_olm_install.yaml

3.Install OCS4.6
Set the toggle to Enabled to enable data encryption on the cluster.

https://access.redhat.com/documentation/en-us/red_hat_openshift_container_storage/4.6/html-single/deploying_openshift_container_storage_on_vmware_vsphere/index?lb_target=stage

4.Verify OCS Installation [pass]

5.Verify OSDs are Encrypted:[Failed]
a.Get node where the OSD runs
$ oc get pods -n openshift-storage -o wide | grep -i osd
b.Go to Node and run "lsblk" command
$ oc debug node/compute-0
sh-4.2# chroot /host /bin/bash
[root@compute-0 /]# lsblk

Note:
When using this script for installation,the OSDs are encrypted.
https://github.com/red-hat-storage/ocs-ci/blob/master/conf/ocsci/encryption_at_rest.yaml



Actual results:
[root@compute-0 /]# lsblk
NAME                        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
loop0                         7:0    0   512G  0 loop 
sda                           8:0    0   120G  0 disk 
|-sda1                        8:1    0   384M  0 part /boot
|-sda2                        8:2    0   127M  0 part /boot/efi
|-sda3                        8:3    0     1M  0 part 
`-sda4                        8:4    0 119.5G  0 part 
  `-coreos-luks-root-nocrypt
                            253:0    0 119.5G  0 dm   /sysroot
sdb                           8:16   0    10G  0 disk /var/lib/kubelet/pods/37e8f110-6922-468d-b739-543118c523ee/volumes/kubernetes.io~vsphere-volume/pvc-efb9e421-87fc-4125-a19b-2bd5411045f2
sdc                           8:32   0   512G  0 disk 
rbd0                        252:0    0    50G  0 disk /var/lib/kubelet/pods/3bc1614e-f4ff-40f6-97d5-458d686331fd/volumes/kubernetes.io~csi/pvc-db9f3217-30e0-4992-8d99-52e2db99508c/mount

Expected results:
[root@compute-0 /]# lsblk
NAME                       MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
loop0                        7:0    0   256G  0 loop  
sda                          8:0    0   120G  0 disk  
|-sda1                       8:1    0   384M  0 part  /boot
|-sda2                       8:2    0   127M  0 part  /boot/efi
|-sda3                       8:3    0     1M  0 part  
`-sda4                       8:4    0 119.5G  0 part  
  `-coreos-luks-root-nocrypt
                           253:0    0 119.5G  0 dm    /sysroot
sdb                          8:16   0    10G  0 disk  /var/lib/kubelet/pods/1ea7b3c0-ffde-4068-959d-1d8ba20030ca/volumes/kubernetes.io~vsphere-volume/pvc-8fe02cf8-5a3f-47e3-9373-a773a2a5966e
sdc                          8:32   0   256G  0 disk  
`-ocs-deviceset-0-data-0-gmxhm-block-dmcrypt
                           253:1    0   256G  0 crypt 
rbd0                       252:0    0    40G  0 disk  /var/lib/kubelet/pods/5d0a1b12-6686-4b2a-97bd-b9ca140190c6/volumes/kubernetes.io~csi/pvc-ce5d64ad-21bf-4ba4-b399-3a2b5da31594/mount
rbd1                       252:16   0    40G  0 disk  /var/lib/kubelet/pods/634e50a9-0eb6-4cfa-ac5a-369bcdb02487/volumes/kubernetes.io~csi/pvc-57f369aa-da05-41e4-b9a9-0fb972928d12/mount

Additional info:

Comment 2 Neha Berry 2020-11-02 08:45:55 UTC

Proposing as a blocker as this needs an investigation to root cause. The cluster is still UP for troubleshooting. Please let us know

Comment 3 Oded 2020-11-02 10:09:46 UTC

Logs: http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/BZ-1893626/

Comment 5 Michael Adam 2020-11-03 10:36:34 UTC

(In reply to Oded from comment #3)
> Logs: http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/BZ-1893626/

Looking at the logs, it does not seem that the UI has set the Encryption flag in the storagecluster yaml:

http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/BZ-1893626/must-gather.local.6710228411716741450/quay-io-rhceph-dev-ocs-must-gather-sha256-9bac3a455e1d0f8fe880798e00f9b970068db95e938b651a2d4521fee7e51fb8/namespaces/openshift-storage/oc_output/storagecluster

(search for "Encryption:" - it's empty should be `Enable: true` if enabled)

So it seems to be either a user error or a bug in the UI.
But since it worked from the UI in another case, I think the UI folks need to look.

Comment 6 Michael Adam 2020-11-03 10:37:39 UTC

For completeness, also the cephcluster / storageclass device set didn't get the encryption flag, so OCS operator processed its input correctly:

http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/BZ-1893626/must-gather.local.6710228411716741450/quay-io-rhceph-dev-ocs-must-gather-sha256-9bac3a455e1d0f8fe880798e00f9b970068db95e938b651a2d4521fee7e51fb8/ceph/namespaces/openshift-storage/ceph.rook.io/cephclusters/ocs-storagecluster-cephcluster.yaml

(line 344+)

Comment 7 Oded 2020-11-03 12:27:23 UTC

@madam
I created a doc that describe my test procedure
https://docs.google.com/document/d/1-E9wKQ599PrUPq7y1BbuVWIBpcGlYWgsS2vsMzLij7k/edit

Comment 8 Neha Berry 2020-11-03 14:35:55 UTC

(In reply to Oded from comment #7)
> @madam
> I created a doc that describe my test procedure
> https://docs.google.com/document/d/1-
> E9wKQ599PrUPq7y1BbuVWIBpcGlYWgsS2vsMzLij7k/edit

If we check the steps and screenshot which this doc contains, it doesn't seem like Oded missed anything

Hence, could it be that the encryption toggle button did NOT really work for the Internal cluster but worked for Internal Attached. It would help to take a look.

In the meanwhile, we will try to test the same again.

Thanks Oded for documenting your steps.. really helpful

Comment 9 Bipul Adhikari 2020-11-03 17:11:07 UTC

Found the issue, there's an issue in the UI.

Comment 10 Michael Adam 2020-11-03 18:28:09 UTC

(In reply to Bipul Adhikari from comment #9)
> Found the issue, there's an issue in the UI.

Thanks Bipul!

But we should not just change the product to OCP, since we loose tracking from OCS this way.
Instead we should clone it into OCP keeping the tracking bug in OCS.

Comment 13 Oded 2020-11-09 20:55:56 UTC

Bug Fixed

Provider: Vmware
OCP Version:4.6.0-0.nightly-2020-11-07-035509

Test Process:
1.Install OCS Operator (ocs-operator.v4.6.0-156.ci) via UI
https://access.redhat.com/documentation/en-us/red_hat_openshift_container_storage/4.6/html-single/deploying_openshift_container_storage_on_vmware_vsphere/index?lb_target=stage

2.Check all pods in openshift-storage name-space

3.Check Ceph health
sh-4.4# ceph health
HEALTH_OK

4.Get clusterserviceversions
$ oc get clusterserviceversions -n openshift-storage
NAME                         DISPLAY                       VERSION        REPLACES   PHASE
ocs-operator.v4.6.0-156.ci   OpenShift Container Storage   4.6.0-156.ci              Succeeded

5.Verify OSD encrypted:
[root@compute-0 /]# lsblk
NAME      MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
loop1       7:1    0   512G  0 loop  
sda         8:0    0   120G  0 disk  
|-sda1      8:1    0   384M  0 part  /boot
|-sda2      8:2    0   127M  0 part  /boot/efi
|-sda3      8:3    0     1M  0 part  
`-sda4      8:4    0 119.5G  0 part  
  `-coreos-luks-root-nocrypt
          253:0    0 119.5G  0 dm    /sysroot
sdb         8:16   0    10G  0 disk  /var/lib/kubelet/pods/e5f97334-d7ae-4b19-ac05-f6e6e7d6546a/volumes/kubernetes.io~vsphere-volume/pvc-6efc4210-1468-4491-8f03-0dd2b16b4826
sdc         8:32   0   512G  0 disk  
`-ocs-deviceset-thin-0-data-0-882rx-block-dmcrypt
          253:1    0   512G  0 crypt 


[root@compute-1 /]# lsblk
NAME     MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
loop0      7:0    0   512G  0 loop  
sda        8:0    0   120G  0 disk  
|-sda1     8:1    0   384M  0 part  /boot
|-sda2     8:2    0   127M  0 part  /boot/efi
|-sda3     8:3    0     1M  0 part  
`-sda4     8:4    0 119.5G  0 part  
  `-coreos-luks-root-nocrypt
         253:0    0 119.5G  0 dm    /sysroot
sdb        8:16   0    10G  0 disk  /var/lib/kubelet/pods/250f98fe-7a37-4eed-b2b9-339f6809d749/volumes/kubernetes.io~vsphere-volume/pvc-0f1d0694-f8ae-46d0-85bf-57f5fa40de8e
sdc        8:32   0   512G  0 disk  
`-ocs-deviceset-thin-1-data-0-7v8nd-block-dmcrypt
         253:1    0   512G  0 crypt 


[root@compute-2 /]# lsblk
NAME      MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
loop0       7:0    0   512G  0 loop  
sda         8:0    0   120G  0 disk  
|-sda1      8:1    0   384M  0 part  /boot
|-sda2      8:2    0   127M  0 part  /boot/efi
|-sda3      8:3    0     1M  0 part  
`-sda4      8:4    0 119.5G  0 part  
  `-coreos-luks-root-nocrypt
          253:0    0 119.5G  0 dm    /sysroot
sdb         8:16   0    10G  0 disk  /var/lib/kubelet/pods/1cefae24-89cd-4aeb-a44b-93685f61402b/volumes/kubernetes.io~vsphere-volume/pvc-9d8afb2e-a884-49f2-ae2b-2c1faaa02118
sdc         8:32   0   512G  0 disk  
`-ocs-deviceset-thin-2-data-0-bfcmm-block-dmcrypt
          253:1    0   512G  0 crypt 
rbd0      252:0    0    50G  0 disk  /var/lib/kubelet/pods/8485563d-6359-4d2e-b308-1efb39ab3cfc/volumes/kubernetes.io~csi/pvc-b8a87734-dd0d-4df6-81cf-dcfb6030c909/mount

Comment 15 errata-xmlrpc 2020-11-16 14:37:43 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.4 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4987

Comment 16 Bipul Adhikari 2020-12-02 05:53:03 UTC

*** Bug 1903413 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.