Bug 1970641

Summary: [GSS][rook] external kms CA cert secret not functional with curl's -capath as implemented today
Product: [Red Hat Storage] Red Hat OpenShift Container Storage Reporter: Randy Martinez <r.martinez>
Component: rookAssignee: Sébastien Han <shan>
Status: VERIFIED --- QA Contact: Neha Berry <nberry>
Severity: high Docs Contact:
Priority: urgent    
Version: 4.7CC: akrai, mhackett, muagarwa, nberry, shan, tdesala, tnielsen
Target Milestone: ---   
Target Release: OCS 4.8.0   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: 4.8.0-432.ci Doc Type: Bug Fix
Doc Text:
Cause: The full chain of certificates provided had one self-signed certificate and no client certificate/private key provided. Consequence: This confused curl on how to validate the certificate, especially since the directory storing the certificates was not c_rehash by openssl which is expected when calling curl with --capath. Fix: Calling curl with --cacert gets the proper certificate validation we need. Result: Certificates are validated correctly and the encryption key can be retrieved.
Story Points: ---
Clone Of:
: 1974399 (view as bug list) Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1974399    

Comment 5 Travis Nielsen 2021-06-14 16:58:53 UTC
The complete solution to this and related vault issues is that we need to call the vault binary for the configuration instead of trying to do it with curl commands. Before we can use the vault binary (200MB), we really need a complete design. There are several options including:
1. Add vault to the RHCS image (increases the image for all scenarios, not just where vault is used)
2. Use vault as a separate (e.g. init container). If needed, copy the binary from the vault image to the RCHS image. For upstream I see a vault image exists already on dockerhub, which would work. For downstream, we would need the build team to chime in with what it would take to package it since we don't use upstream images in the product.

I'd really like to see the 2nd approach work, but need to discuss more with Seb. 

That full solution really needs to wait for 4.9, we are too late for 4.8. For 4.8 we need to find a more scoped fix for this to unblock the scenario.

Comment 6 Michael Adam 2021-06-14 18:04:33 UTC
@Travis, apart from the conceptually good and complete solution, is there a workaround that could be applied on an existing system?

Comment 8 Travis Nielsen 2021-06-14 18:44:36 UTC
Randy Thanks for confirming the workaround. So you were able to manually update the osd pod spec by reverting the changes in this PR: 
 https://github.com/rook/rook/pull/7298

Before we revert that PR in 4.8, I'd like Seb's input. That PR had a purpose and it may cause other issues if we revert it.

Comment 10 Sébastien Han 2021-06-21 12:57:17 UTC
Unfortunately, if we revert, we break https://bugzilla.redhat.com/show_bug.cgi?id=1931839 so it's not possible.
Alternatively, I'm working on a small patch that should fix the issue.

Honestly, I don't know what's going on, somehow we never had to use c_rehash on /etc/vault to get this working with a fully signed fullchain.pem, client cert, client key.
Reading the case, it looks like one of the certificates in the chain was self-signed, which might be the root cause of this.

Unfortunately, the directory cannot be processed by openssl for various reasons (binary not available, permissions etc) to run c_rehash.

The current workaround is the one already provided by Randy in https://bugzilla.redhat.com/show_bug.cgi?id=1970641#c7. Not ideal but better than nothing.

Comment 11 Sébastien Han 2021-06-21 13:06:34 UTC
I'm moving the severity to high since there is some capacity to produce and a workaround is available.

Comment 14 Sébastien Han 2021-06-21 16:20:50 UTC
To verify this BZ:

1. configure cluster-wide encryption cluster with signed certificate, providing the full chain of certificates in a fullchain.pem in the VAULT_CACERT section of the UI
2. do NOT use client key or cert
3. deploy the cluster
4. verify OSDs are coming up (normal encryption verification)