Bug 1915445 - Uninstall 4.7: Storagecluster deletion stuck on a partially created KMS enabled OCS cluster + support TLS configuration for KMS
Summary: Uninstall 4.7: Storagecluster deletion stuck on a partially created KMS enabl...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: Multi-Cloud Object Gateway
Version: 4.7
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: OCS 4.7.0
Assignee: Romy Ayalon
QA Contact: Filip Balák
URL:
Whiteboard:
Depends On:
Blocks: 1916850
TreeView+ depends on / blocked
 
Reported: 2021-01-12 16:31 UTC by Neha Berry
Modified: 2021-05-19 09:18 UTC (History)
7 users (show)

Fixed In Version: v4.7.0-237.ci
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1916850 (view as bug list)
Environment:
Last Closed: 2021-05-19 09:17:47 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github noobaa noobaa-operator pull 522 0 None closed Backport to 5.7 2021-02-17 13:37:56 UTC
Red Hat Product Errata RHSA-2021:2041 0 None None None 2021-05-19 09:18:12 UTC

Description Neha Berry 2021-01-12 16:31:02 UTC
Description of problem (please be detailed as possible and provide log
snippests):
===================================================================
Storage cluster deletion stuck for hours , waiting on noobaa resources in a partially installed KMS encryption enabled OCS 4.7 cluster. Due to issues, OSDs could not be created and hence the noobaa-db-pg-0 PVC was stuck in pending state. Could it be the resource noobaa is waiting for deletion ?

Snip from ocs-operator logs
-----------------------------

{"level":"info","ts":1610467637.3470006,"logger":"controllers.StorageCluster","msg":"Uninstall in progress","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster","Status":"Uninstall: Waiting on NooBaa system to be deleted"}

======= PVC ==========
NAME                                STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  AGE
db-noobaa-db-pg-0                   Pending                                                                        ocs-storagecluster-ceph-rbd   4h




>> Details and background: We tried to install OCS using the KMS encryption from UI. In this attempt, we hit the issue of [1]. To WA this issue, we added [VAULT_SKIP_VERIFY: "true"] in the configmap and deployment progressed a little, but the OSDs failed to come up due to following error:

[1] Bug 1915202 - Can not configure KMS with unknown CA certificate

Snip from rook-op logs
-----
2021-01-12 12:14:20.978135 E | op-osd: failed to store secret. failed to init vault kms: failed to initialize vault secret store: Error making API request.

...
2021-01-12 12:14:21.375825 E | op-osd: failed to store secret. failed to init vault kms: failed to initialize vault secret store: Error making API request.

URL: GET https://10.0.106.147:8200/v1/sys/mounts
Code: 403. Errors:

* permission denied





Version of all relevant components (if applicable):
===================================================================
OCP = 4.7.0-0.nightly-2021-01-07-034013
OCS = ocs-operator.v4.7.0-230.ci

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
===================================================================
Yes. unable to uninstall and re-install

Is there any workaround available to the best of your knowledge?
===================================================================
No idea

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
===================================================================
4

Can this issue reproducible?
===================================================================
Not sure

Can this issue reproduce from the UI?
===================================================================
Not sure

If this is a regression, please provide more details to justify this:
===================================================================
Not sure

Steps to Reproduce:
===================================================================
1. Install OCP 4.7 on vmware
2. Install OCS 4.7 operator and then click on Create STorage cluster
3. In the configure section - enable cluster-wide encryption and add the KMS details from external vault server. 
4. Click Create in Review and Create Page
5. If you hit Bug 1915202, edit the configmap below to add [VAULT_SKIP_VERIFY: "true"] 
6. See if install succeeds, but it is seen OSD creation still fails due to KMS related permission denied issues
7. The noobaa-db-pg-0 PVC stays in pending state
8. Try to uninstall OCS by deleting the Storagecluster from UI or CLI. Make sure no extra OBCs or PVCs apart from OSD/MON/Nooobaa db PVCs exist.


The configmap "ocs-kms-connection-details" was edited 

data:
  KMS_PROVIDER: vault
  KMS_SERVICE_NAME: vault
  VAULT_ADDR: https://10.0.106.147:8200
  VAULT_BACKEND_PATH: ""
  VAULT_NAMESPACE: ""
  VAULT_SKIP_VERIFY: "true"
  VAULT_TLS_SERVER_NAME: ""



Actual results:
===================================================================
Storagecluster deletion is stuck since hours

$ oc get storagecluster
NAME                 AGE     PHASE      EXTERNAL   CREATED AT             VERSION
ocs-storagecluster   4h22m   Deleting              2021-01-12T12:04:45Z   4.7.0

deletionTimestamp: "2021-01-12T15:07:07Z"


Expected results:
===================================================================
Uninstall should succeed, especially for clusters which failed proper deployment due to various issues and need to be cleaned up.

Comment 4 Mudit Agarwal 2021-01-13 10:09:29 UTC
This is being looked by Noobaa team (https://chat.google.com/room/AAAAREGEba8/_IBytsgj4uo), moving it there.

Comment 6 Elad 2021-01-13 19:20:17 UTC
Marking as a regression because uninstall used to work up until now

Comment 8 Shay Rozen 2021-01-13 20:01:04 UTC
I've noticed that when KMS configured must-gather takes hours. Probably the same issue.

Comment 13 Mudit Agarwal 2021-01-15 12:37:11 UTC
Yeah, sure. I guess this issue is already there for Noobaa uninstall. You can create one for cephobjectore.

Comment 19 Filip Balák 2021-02-04 11:47:46 UTC
Deleting storage cluster works as expected when kms is set correctly and also when the cluster installation failed due to misconfiguration of kms certificates. --> VERIFIED

Tested with:
ocs v4.7.0-250.ci

Comment 22 errata-xmlrpc 2021-05-19 09:17:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2041


Note You need to log in before you can comment on or make changes to this bug.