1970641 – [GSS][rook] external kms CA cert secret not functional with curl's -capath as implemented today

Bug 1970641 - [GSS][rook] external kms CA cert secret not functional with curl's -capath as implemented today

Summary: [GSS][rook] external kms CA cert secret not functional with curl's -capath as...

Keywords:
Status:	VERIFIED
Alias:	None
Product:	Red Hat OpenShift Container Storage
Classification:	Red Hat Storage
Component:	rook
Sub Component:
Version:	4.7
Hardware:	All
OS:	All
Priority:	urgent
Severity:	high
Target Milestone:	---
Target Release:	OCS 4.8.0
Assignee:	Sébastien Han
QA Contact:	Neha Berry
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1974399
TreeView+	depends on / blocked

Reported:	2021-06-10 21:48 UTC by Randy Martinez
Modified:	2024-10-01 18:34 UTC (History)
CC List:	7 users (show)
Fixed In Version:	4.8.0-432.ci
Doc Type:	Bug Fix
Doc Text:	Cause: The full chain of certificates provided had one self-signed certificate and no client certificate/private key provided. Consequence: This confused curl on how to validate the certificate, especially since the directory storing the certificates was not c_rehash by openssl which is expected when calling curl with --capath. Fix: Calling curl with --cacert gets the proper certificate validation we need. Result: Certificates are validated correctly and the encryption key can be retrieved.
Clone Of:
Clones:	1974399 (view as bug list)
Environment:
Last Closed:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift rook pull 257	0	None	open	Bug 1970641: ceph: use cacert if no client cert/key are present	2021-06-21 16:18:54 UTC
Github	red-hat-storage ocs-ci pull 4938	0	None	Merged	Deploy KMS with CA Certificate[without Client Certificate and Client Private Key]	2021-11-22 05:45:33 UTC
Github	rook rook pull 8157	0	None	open	ceph: use cacert if no client cert/key are present	2021-06-21 13:06:32 UTC
Red Hat Bugzilla	1970583	1	urgent	CLOSED	[GSS][rook] hashicorp vault v2 not supported in current release	2024-10-01 18:34:44 UTC

Comment 5 Travis Nielsen 2021-06-14 16:58:53 UTC

The complete solution to this and related vault issues is that we need to call the vault binary for the configuration instead of trying to do it with curl commands. Before we can use the vault binary (200MB), we really need a complete design. There are several options including:
1. Add vault to the RHCS image (increases the image for all scenarios, not just where vault is used)
2. Use vault as a separate (e.g. init container). If needed, copy the binary from the vault image to the RCHS image. For upstream I see a vault image exists already on dockerhub, which would work. For downstream, we would need the build team to chime in with what it would take to package it since we don't use upstream images in the product.

I'd really like to see the 2nd approach work, but need to discuss more with Seb. 

That full solution really needs to wait for 4.9, we are too late for 4.8. For 4.8 we need to find a more scoped fix for this to unblock the scenario.

Comment 6 Michael Adam 2021-06-14 18:04:33 UTC

@Travis, apart from the conceptually good and complete solution, is there a workaround that could be applied on an existing system?

Comment 8 Travis Nielsen 2021-06-14 18:44:36 UTC

Randy Thanks for confirming the workaround. So you were able to manually update the osd pod spec by reverting the changes in this PR: 
 https://github.com/rook/rook/pull/7298

Before we revert that PR in 4.8, I'd like Seb's input. That PR had a purpose and it may cause other issues if we revert it.

Comment 10 Sébastien Han 2021-06-21 12:57:17 UTC

Unfortunately, if we revert, we break https://bugzilla.redhat.com/show_bug.cgi?id=1931839 so it's not possible.
Alternatively, I'm working on a small patch that should fix the issue.

Honestly, I don't know what's going on, somehow we never had to use c_rehash on /etc/vault to get this working with a fully signed fullchain.pem, client cert, client key.
Reading the case, it looks like one of the certificates in the chain was self-signed, which might be the root cause of this.

Unfortunately, the directory cannot be processed by openssl for various reasons (binary not available, permissions etc) to run c_rehash.

The current workaround is the one already provided by Randy in https://bugzilla.redhat.com/show_bug.cgi?id=1970641#c7. Not ideal but better than nothing.

Comment 11 Sébastien Han 2021-06-21 13:06:34 UTC

I'm moving the severity to high since there is some capacity to produce and a workaround is available.

Comment 14 Sébastien Han 2021-06-21 16:20:50 UTC

To verify this BZ:

1. configure cluster-wide encryption cluster with signed certificate, providing the full chain of certificates in a fullchain.pem in the VAULT_CACERT section of the UI
2. do NOT use client key or cert
3. deploy the cluster
4. verify OSDs are coming up (normal encryption verification)

Note You need to log in before you can comment on or make changes to this bug.