Bug 2061657

Summary: [KMS] The error message is incorrect when the odf-vault-auth SA is deleted and the OSD pod is respun
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Rachael <rgeorge>
Component: rookAssignee: Sébastien Han <shan>
Status: CLOSED CURRENTRELEASE QA Contact: Rachael <rgeorge>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.10CC: madam, mmuench, muagarwa, nberry, ocs-bugs, odf-bz-bot, shan
Target Milestone: ---   
Target Release: ODF 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 4.10.0-210 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-04-21 09:12:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rachael 2022-03-08 07:40:37 UTC
Description of problem (please be detailed as possible and provide log
snippets):

When the odf-vault-auth serviceaccount is deleted and the OSD pod is respun, the pod goes into Init:CrashLoopBackOff state as expected. The logs from the encryption-kms-get-kek container in the OSD pod show the following error message:

$ oc logs -c encryption-kms-get-kek rook-ceph-osd-1-bb55c5d5-645h2
2022-03-08 07:20:27.648973 C | rookcmd: failed to validate kms connection details: failed to get backend version: failed to initialize vault client: failed to get vault authentication token: Error making API request.

URL: PUT https://vault.default.svc.cluster.local:8200/v1/auth/kubernetes/login
Code: 403. Errors:

* permission denied

Since the authentication method is kubernetes, the error message should mention failure to get the serviceaccount instead of the authentication token.



Version of all relevant components (if applicable):
---------------------------------------------------
OCP: 4.10.0-0.nightly-2022-03-08-002944
ODF: odf-operator.v4.10.0   full_version=4.10.0-179


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
No


Is there any workaround available to the best of your knowledge?
N/A


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1


Can this issue reproducible?
Yes


Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:
No

Steps to Reproduce:
-------------------
1. Deploy an ODF cluster with clusterwide encryption enabled using KMS kubernetes authentication method.

2. Once the cluster is up and running, delete the odf-vault-auth SA
   $ oc delete sa odf-vault-auth
   serviceaccount "odf-vault-auth" deleted

3. Respin one of the OSD pods
  $ oc delete pod rook-ceph-osd-1-bb55c5d5-5n2qf
  pod "rook-ceph-osd-1-bb55c5d5-5n2qf" deleted

4. The OSD pod should go into Init:CrashLoopBackOff state. Check the OSD logs.
  $ oc logs -c encryption-kms-get-kek rook-ceph-osd-1-bb55c5d5-645h2


Actual results:
---------------
The error message as shown below mentions about missing authentication token.

2022-03-08 07:20:27.648973 C | rookcmd: failed to validate kms connection details: failed to get backend version: failed to initialize vault client: failed to get vault authentication token: Error making API request.

URL: PUT https://vault.default.svc.cluster.local:8200/v1/auth/kubernetes/login
Code: 403. Errors:

* permission denied

Expected results:
-----------------
The error message should mention about the missing serviceaccount.

Comment 2 Sébastien Han 2022-03-10 08:58:17 UTC
It's hard to actually make the distinction here. Also the message "failed to get vault authentication token:" is not the last error, it's just in the chain of errors.
I understand the confusion but internally when using kube auth with vault, a token is also used for authentication. So the message is actually correct.

I can probably make the error different if you think it can avoid the confusion, even though technically the error is correct.
What do you think?

Comment 4 Sébastien Han 2022-03-10 10:07:40 UTC
Sounds good then.