Bug 1915445

Summary: Uninstall 4.7: Storagecluster deletion stuck on a partially created KMS enabled OCS cluster + support TLS configuration for KMS
Product: [Red Hat Storage] Red Hat OpenShift Container Storage Reporter: Neha Berry <nberry>
Component: Multi-Cloud Object GatewayAssignee: Romy Ayalon <rayalon>
Status: CLOSED ERRATA QA Contact: Filip Balák <fbalak>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.7CC: ebenahar, etamir, madam, muagarwa, nbecker, ocs-bugs, srozen
Target Milestone: ---Keywords: AutomationBackLog, Regression
Target Release: OCS 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: v4.7.0-237.ci Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 1916850 (view as bug list) Environment:
Last Closed: 2021-05-19 09:17:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1916850    

Description Neha Berry 2021-01-12 16:31:02 UTC
Description of problem (please be detailed as possible and provide log
snippests):
===================================================================
Storage cluster deletion stuck for hours , waiting on noobaa resources in a partially installed KMS encryption enabled OCS 4.7 cluster. Due to issues, OSDs could not be created and hence the noobaa-db-pg-0 PVC was stuck in pending state. Could it be the resource noobaa is waiting for deletion ?

Snip from ocs-operator logs
-----------------------------

{"level":"info","ts":1610467637.3470006,"logger":"controllers.StorageCluster","msg":"Uninstall in progress","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster","Status":"Uninstall: Waiting on NooBaa system to be deleted"}

======= PVC ==========
NAME                                STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  AGE
db-noobaa-db-pg-0                   Pending                                                                        ocs-storagecluster-ceph-rbd   4h




>> Details and background: We tried to install OCS using the KMS encryption from UI. In this attempt, we hit the issue of [1]. To WA this issue, we added [VAULT_SKIP_VERIFY: "true"] in the configmap and deployment progressed a little, but the OSDs failed to come up due to following error:

[1] Bug 1915202 - Can not configure KMS with unknown CA certificate

Snip from rook-op logs
-----
2021-01-12 12:14:20.978135 E | op-osd: failed to store secret. failed to init vault kms: failed to initialize vault secret store: Error making API request.

...
2021-01-12 12:14:21.375825 E | op-osd: failed to store secret. failed to init vault kms: failed to initialize vault secret store: Error making API request.

URL: GET https://10.0.106.147:8200/v1/sys/mounts
Code: 403. Errors:

* permission denied





Version of all relevant components (if applicable):
===================================================================
OCP = 4.7.0-0.nightly-2021-01-07-034013
OCS = ocs-operator.v4.7.0-230.ci

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
===================================================================
Yes. unable to uninstall and re-install

Is there any workaround available to the best of your knowledge?
===================================================================
No idea

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
===================================================================
4

Can this issue reproducible?
===================================================================
Not sure

Can this issue reproduce from the UI?
===================================================================
Not sure

If this is a regression, please provide more details to justify this:
===================================================================
Not sure

Steps to Reproduce:
===================================================================
1. Install OCP 4.7 on vmware
2. Install OCS 4.7 operator and then click on Create STorage cluster
3. In the configure section - enable cluster-wide encryption and add the KMS details from external vault server. 
4. Click Create in Review and Create Page
5. If you hit Bug 1915202, edit the configmap below to add [VAULT_SKIP_VERIFY: "true"] 
6. See if install succeeds, but it is seen OSD creation still fails due to KMS related permission denied issues
7. The noobaa-db-pg-0 PVC stays in pending state
8. Try to uninstall OCS by deleting the Storagecluster from UI or CLI. Make sure no extra OBCs or PVCs apart from OSD/MON/Nooobaa db PVCs exist.


The configmap "ocs-kms-connection-details" was edited 

data:
  KMS_PROVIDER: vault
  KMS_SERVICE_NAME: vault
  VAULT_ADDR: https://10.0.106.147:8200
  VAULT_BACKEND_PATH: ""
  VAULT_NAMESPACE: ""
  VAULT_SKIP_VERIFY: "true"
  VAULT_TLS_SERVER_NAME: ""



Actual results:
===================================================================
Storagecluster deletion is stuck since hours

$ oc get storagecluster
NAME                 AGE     PHASE      EXTERNAL   CREATED AT             VERSION
ocs-storagecluster   4h22m   Deleting              2021-01-12T12:04:45Z   4.7.0

deletionTimestamp: "2021-01-12T15:07:07Z"


Expected results:
===================================================================
Uninstall should succeed, especially for clusters which failed proper deployment due to various issues and need to be cleaned up.

Comment 4 Mudit Agarwal 2021-01-13 10:09:29 UTC
This is being looked by Noobaa team (https://chat.google.com/room/AAAAREGEba8/_IBytsgj4uo), moving it there.

Comment 6 Elad 2021-01-13 19:20:17 UTC
Marking as a regression because uninstall used to work up until now

Comment 8 Shay Rozen 2021-01-13 20:01:04 UTC
I've noticed that when KMS configured must-gather takes hours. Probably the same issue.

Comment 13 Mudit Agarwal 2021-01-15 12:37:11 UTC
Yeah, sure. I guess this issue is already there for Noobaa uninstall. You can create one for cephobjectore.

Comment 19 Filip Balák 2021-02-04 11:47:46 UTC
Deleting storage cluster works as expected when kms is set correctly and also when the cluster installation failed due to misconfiguration of kms certificates. --> VERIFIED

Tested with:
ocs v4.7.0-250.ci

Comment 22 errata-xmlrpc 2021-05-19 09:17:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2041