Created attachment 1725706 [details] deploy olm file Description of problem (please be detailed as possible and provide log snippests): Failed to encrypt OSDs on OCS4.6 installation (via UI) Version of all relevant components (if applicable): Provider: Vmware OCP Version: 4.6.0-0.nightly-2020-10-31-214252 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? yes Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue reproducible? yes Can this issue reproduce from the UI? yes If this is a regression, please provide more details to justify this: Steps to Reproduce: 1.Verify OCP status:[PASS] 2.Deploy OLM(On OCP4.6, OCS Operator won't be shown in the operator hub without this command) image: quay.io/rhceph-dev/ocs-registry:4.6.0-149.ci $ oc create -f deploy_olm_install.yaml 3.Install OCS4.6 Set the toggle to Enabled to enable data encryption on the cluster. https://access.redhat.com/documentation/en-us/red_hat_openshift_container_storage/4.6/html-single/deploying_openshift_container_storage_on_vmware_vsphere/index?lb_target=stage 4.Verify OCS Installation [pass] 5.Verify OSDs are Encrypted:[Failed] a.Get node where the OSD runs $ oc get pods -n openshift-storage -o wide | grep -i osd b.Go to Node and run "lsblk" command $ oc debug node/compute-0 sh-4.2# chroot /host /bin/bash [root@compute-0 /]# lsblk Note: When using this script for installation,the OSDs are encrypted. https://github.com/red-hat-storage/ocs-ci/blob/master/conf/ocsci/encryption_at_rest.yaml Actual results: [root@compute-0 /]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 512G 0 loop sda 8:0 0 120G 0 disk |-sda1 8:1 0 384M 0 part /boot |-sda2 8:2 0 127M 0 part /boot/efi |-sda3 8:3 0 1M 0 part `-sda4 8:4 0 119.5G 0 part `-coreos-luks-root-nocrypt 253:0 0 119.5G 0 dm /sysroot sdb 8:16 0 10G 0 disk /var/lib/kubelet/pods/37e8f110-6922-468d-b739-543118c523ee/volumes/kubernetes.io~vsphere-volume/pvc-efb9e421-87fc-4125-a19b-2bd5411045f2 sdc 8:32 0 512G 0 disk rbd0 252:0 0 50G 0 disk /var/lib/kubelet/pods/3bc1614e-f4ff-40f6-97d5-458d686331fd/volumes/kubernetes.io~csi/pvc-db9f3217-30e0-4992-8d99-52e2db99508c/mount Expected results: [root@compute-0 /]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 256G 0 loop sda 8:0 0 120G 0 disk |-sda1 8:1 0 384M 0 part /boot |-sda2 8:2 0 127M 0 part /boot/efi |-sda3 8:3 0 1M 0 part `-sda4 8:4 0 119.5G 0 part `-coreos-luks-root-nocrypt 253:0 0 119.5G 0 dm /sysroot sdb 8:16 0 10G 0 disk /var/lib/kubelet/pods/1ea7b3c0-ffde-4068-959d-1d8ba20030ca/volumes/kubernetes.io~vsphere-volume/pvc-8fe02cf8-5a3f-47e3-9373-a773a2a5966e sdc 8:32 0 256G 0 disk `-ocs-deviceset-0-data-0-gmxhm-block-dmcrypt 253:1 0 256G 0 crypt rbd0 252:0 0 40G 0 disk /var/lib/kubelet/pods/5d0a1b12-6686-4b2a-97bd-b9ca140190c6/volumes/kubernetes.io~csi/pvc-ce5d64ad-21bf-4ba4-b399-3a2b5da31594/mount rbd1 252:16 0 40G 0 disk /var/lib/kubelet/pods/634e50a9-0eb6-4cfa-ac5a-369bcdb02487/volumes/kubernetes.io~csi/pvc-57f369aa-da05-41e4-b9a9-0fb972928d12/mount Additional info:
Proposing as a blocker as this needs an investigation to root cause. The cluster is still UP for troubleshooting. Please let us know
Logs: http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/BZ-1893626/
(In reply to Oded from comment #3) > Logs: http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/BZ-1893626/ Looking at the logs, it does not seem that the UI has set the Encryption flag in the storagecluster yaml: http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/BZ-1893626/must-gather.local.6710228411716741450/quay-io-rhceph-dev-ocs-must-gather-sha256-9bac3a455e1d0f8fe880798e00f9b970068db95e938b651a2d4521fee7e51fb8/namespaces/openshift-storage/oc_output/storagecluster (search for "Encryption:" - it's empty should be `Enable: true` if enabled) So it seems to be either a user error or a bug in the UI. But since it worked from the UI in another case, I think the UI folks need to look.
For completeness, also the cephcluster / storageclass device set didn't get the encryption flag, so OCS operator processed its input correctly: http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/BZ-1893626/must-gather.local.6710228411716741450/quay-io-rhceph-dev-ocs-must-gather-sha256-9bac3a455e1d0f8fe880798e00f9b970068db95e938b651a2d4521fee7e51fb8/ceph/namespaces/openshift-storage/ceph.rook.io/cephclusters/ocs-storagecluster-cephcluster.yaml (line 344+)
@madam I created a doc that describe my test procedure https://docs.google.com/document/d/1-E9wKQ599PrUPq7y1BbuVWIBpcGlYWgsS2vsMzLij7k/edit
(In reply to Oded from comment #7) > @madam > I created a doc that describe my test procedure > https://docs.google.com/document/d/1- > E9wKQ599PrUPq7y1BbuVWIBpcGlYWgsS2vsMzLij7k/edit If we check the steps and screenshot which this doc contains, it doesn't seem like Oded missed anything Hence, could it be that the encryption toggle button did NOT really work for the Internal cluster but worked for Internal Attached. It would help to take a look. In the meanwhile, we will try to test the same again. Thanks Oded for documenting your steps.. really helpful
Found the issue, there's an issue in the UI.
(In reply to Bipul Adhikari from comment #9) > Found the issue, there's an issue in the UI. Thanks Bipul! But we should not just change the product to OCP, since we loose tracking from OCS this way. Instead we should clone it into OCP keeping the tracking bug in OCS.
Bug Fixed Provider: Vmware OCP Version:4.6.0-0.nightly-2020-11-07-035509 Test Process: 1.Install OCS Operator (ocs-operator.v4.6.0-156.ci) via UI https://access.redhat.com/documentation/en-us/red_hat_openshift_container_storage/4.6/html-single/deploying_openshift_container_storage_on_vmware_vsphere/index?lb_target=stage 2.Check all pods in openshift-storage name-space 3.Check Ceph health sh-4.4# ceph health HEALTH_OK 4.Get clusterserviceversions $ oc get clusterserviceversions -n openshift-storage NAME DISPLAY VERSION REPLACES PHASE ocs-operator.v4.6.0-156.ci OpenShift Container Storage 4.6.0-156.ci Succeeded 5.Verify OSD encrypted: [root@compute-0 /]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop1 7:1 0 512G 0 loop sda 8:0 0 120G 0 disk |-sda1 8:1 0 384M 0 part /boot |-sda2 8:2 0 127M 0 part /boot/efi |-sda3 8:3 0 1M 0 part `-sda4 8:4 0 119.5G 0 part `-coreos-luks-root-nocrypt 253:0 0 119.5G 0 dm /sysroot sdb 8:16 0 10G 0 disk /var/lib/kubelet/pods/e5f97334-d7ae-4b19-ac05-f6e6e7d6546a/volumes/kubernetes.io~vsphere-volume/pvc-6efc4210-1468-4491-8f03-0dd2b16b4826 sdc 8:32 0 512G 0 disk `-ocs-deviceset-thin-0-data-0-882rx-block-dmcrypt 253:1 0 512G 0 crypt [root@compute-1 /]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 512G 0 loop sda 8:0 0 120G 0 disk |-sda1 8:1 0 384M 0 part /boot |-sda2 8:2 0 127M 0 part /boot/efi |-sda3 8:3 0 1M 0 part `-sda4 8:4 0 119.5G 0 part `-coreos-luks-root-nocrypt 253:0 0 119.5G 0 dm /sysroot sdb 8:16 0 10G 0 disk /var/lib/kubelet/pods/250f98fe-7a37-4eed-b2b9-339f6809d749/volumes/kubernetes.io~vsphere-volume/pvc-0f1d0694-f8ae-46d0-85bf-57f5fa40de8e sdc 8:32 0 512G 0 disk `-ocs-deviceset-thin-1-data-0-7v8nd-block-dmcrypt 253:1 0 512G 0 crypt [root@compute-2 /]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 512G 0 loop sda 8:0 0 120G 0 disk |-sda1 8:1 0 384M 0 part /boot |-sda2 8:2 0 127M 0 part /boot/efi |-sda3 8:3 0 1M 0 part `-sda4 8:4 0 119.5G 0 part `-coreos-luks-root-nocrypt 253:0 0 119.5G 0 dm /sysroot sdb 8:16 0 10G 0 disk /var/lib/kubelet/pods/1cefae24-89cd-4aeb-a44b-93685f61402b/volumes/kubernetes.io~vsphere-volume/pvc-9d8afb2e-a884-49f2-ae2b-2c1faaa02118 sdc 8:32 0 512G 0 disk `-ocs-deviceset-thin-2-data-0-bfcmm-block-dmcrypt 253:1 0 512G 0 crypt rbd0 252:0 0 50G 0 disk /var/lib/kubelet/pods/8485563d-6359-4d2e-b308-1efb39ab3cfc/volumes/kubernetes.io~csi/pvc-b8a87734-dd0d-4df6-81cf-dcfb6030c909/mount
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6.4 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4987
*** Bug 1903413 has been marked as a duplicate of this bug. ***