Bug 1970641 - [GSS][rook] external kms CA cert secret not functional with curl's -capath as implemented today
Summary: [GSS][rook] external kms CA cert secret not functional with curl's -capath as...
Keywords:
Status: VERIFIED
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: rook
Version: 4.7
Hardware: All
OS: All
urgent
high
Target Milestone: ---
: OCS 4.8.0
Assignee: Sébastien Han
QA Contact: Neha Berry
URL:
Whiteboard:
Depends On:
Blocks: 1974399
TreeView+ depends on / blocked
 
Reported: 2021-06-10 21:48 UTC by Randy Martinez
Modified: 2024-10-01 18:34 UTC (History)
7 users (show)

Fixed In Version: 4.8.0-432.ci
Doc Type: Bug Fix
Doc Text:
Cause: The full chain of certificates provided had one self-signed certificate and no client certificate/private key provided. Consequence: This confused curl on how to validate the certificate, especially since the directory storing the certificates was not c_rehash by openssl which is expected when calling curl with --capath. Fix: Calling curl with --cacert gets the proper certificate validation we need. Result: Certificates are validated correctly and the encryption key can be retrieved.
Clone Of:
: 1974399 (view as bug list)
Environment:
Last Closed:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift rook pull 257 0 None open Bug 1970641: ceph: use cacert if no client cert/key are present 2021-06-21 16:18:54 UTC
Github red-hat-storage ocs-ci pull 4938 0 None Merged Deploy KMS with CA Certificate[without Client Certificate and Client Private Key] 2021-11-22 05:45:33 UTC
Github rook rook pull 8157 0 None open ceph: use cacert if no client cert/key are present 2021-06-21 13:06:32 UTC
Red Hat Bugzilla 1970583 1 urgent CLOSED [GSS][rook] hashicorp vault v2 not supported in current release 2024-10-01 18:34:44 UTC

Comment 5 Travis Nielsen 2021-06-14 16:58:53 UTC
The complete solution to this and related vault issues is that we need to call the vault binary for the configuration instead of trying to do it with curl commands. Before we can use the vault binary (200MB), we really need a complete design. There are several options including:
1. Add vault to the RHCS image (increases the image for all scenarios, not just where vault is used)
2. Use vault as a separate (e.g. init container). If needed, copy the binary from the vault image to the RCHS image. For upstream I see a vault image exists already on dockerhub, which would work. For downstream, we would need the build team to chime in with what it would take to package it since we don't use upstream images in the product.

I'd really like to see the 2nd approach work, but need to discuss more with Seb. 

That full solution really needs to wait for 4.9, we are too late for 4.8. For 4.8 we need to find a more scoped fix for this to unblock the scenario.

Comment 6 Michael Adam 2021-06-14 18:04:33 UTC
@Travis, apart from the conceptually good and complete solution, is there a workaround that could be applied on an existing system?

Comment 8 Travis Nielsen 2021-06-14 18:44:36 UTC
Randy Thanks for confirming the workaround. So you were able to manually update the osd pod spec by reverting the changes in this PR: 
 https://github.com/rook/rook/pull/7298

Before we revert that PR in 4.8, I'd like Seb's input. That PR had a purpose and it may cause other issues if we revert it.

Comment 10 Sébastien Han 2021-06-21 12:57:17 UTC
Unfortunately, if we revert, we break https://bugzilla.redhat.com/show_bug.cgi?id=1931839 so it's not possible.
Alternatively, I'm working on a small patch that should fix the issue.

Honestly, I don't know what's going on, somehow we never had to use c_rehash on /etc/vault to get this working with a fully signed fullchain.pem, client cert, client key.
Reading the case, it looks like one of the certificates in the chain was self-signed, which might be the root cause of this.

Unfortunately, the directory cannot be processed by openssl for various reasons (binary not available, permissions etc) to run c_rehash.

The current workaround is the one already provided by Randy in https://bugzilla.redhat.com/show_bug.cgi?id=1970641#c7. Not ideal but better than nothing.

Comment 11 Sébastien Han 2021-06-21 13:06:34 UTC
I'm moving the severity to high since there is some capacity to produce and a workaround is available.

Comment 14 Sébastien Han 2021-06-21 16:20:50 UTC
To verify this BZ:

1. configure cluster-wide encryption cluster with signed certificate, providing the full chain of certificates in a fullchain.pem in the VAULT_CACERT section of the UI
2. do NOT use client key or cert
3. deploy the cluster
4. verify OSDs are coming up (normal encryption verification)


Note You need to log in before you can comment on or make changes to this bug.