The complete solution to this and related vault issues is that we need to call the vault binary for the configuration instead of trying to do it with curl commands. Before we can use the vault binary (200MB), we really need a complete design. There are several options including: 1. Add vault to the RHCS image (increases the image for all scenarios, not just where vault is used) 2. Use vault as a separate (e.g. init container). If needed, copy the binary from the vault image to the RCHS image. For upstream I see a vault image exists already on dockerhub, which would work. For downstream, we would need the build team to chime in with what it would take to package it since we don't use upstream images in the product. I'd really like to see the 2nd approach work, but need to discuss more with Seb. That full solution really needs to wait for 4.9, we are too late for 4.8. For 4.8 we need to find a more scoped fix for this to unblock the scenario.
@Travis, apart from the conceptually good and complete solution, is there a workaround that could be applied on an existing system?
Randy Thanks for confirming the workaround. So you were able to manually update the osd pod spec by reverting the changes in this PR: https://github.com/rook/rook/pull/7298 Before we revert that PR in 4.8, I'd like Seb's input. That PR had a purpose and it may cause other issues if we revert it.
Unfortunately, if we revert, we break https://bugzilla.redhat.com/show_bug.cgi?id=1931839 so it's not possible. Alternatively, I'm working on a small patch that should fix the issue. Honestly, I don't know what's going on, somehow we never had to use c_rehash on /etc/vault to get this working with a fully signed fullchain.pem, client cert, client key. Reading the case, it looks like one of the certificates in the chain was self-signed, which might be the root cause of this. Unfortunately, the directory cannot be processed by openssl for various reasons (binary not available, permissions etc) to run c_rehash. The current workaround is the one already provided by Randy in https://bugzilla.redhat.com/show_bug.cgi?id=1970641#c7. Not ideal but better than nothing.
I'm moving the severity to high since there is some capacity to produce and a workaround is available.
To verify this BZ: 1. configure cluster-wide encryption cluster with signed certificate, providing the full chain of certificates in a fullchain.pem in the VAULT_CACERT section of the UI 2. do NOT use client key or cert 3. deploy the cluster 4. verify OSDs are coming up (normal encryption verification)