Bug 2187197
| Summary: | Noobaa pods are not coming up after enabling Ceph storageclass encryption | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | narayanspg <ngowda> | ||||
| Component: | csi-driver | Assignee: | Rakshith <rar> | ||||
| Status: | CLOSED NOTABUG | QA Contact: | krishnaram Karthick <kramdoss> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 4.13 | CC: | aindenba, mparida, muagarwa, nbecker, ocs-bugs, odf-bz-bot, pakamble, rar | ||||
| Target Milestone: | --- | Keywords: | TestBlocker | ||||
| Target Release: | --- | ||||||
| Hardware: | ppc64le | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2023-05-03 04:39:31 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Created attachment 1957792 [details]
noobaa operator logs
getting below errors when we create PVC with storageclass with encryption. PVCnew-sc-pvc-1 NamespaceNSdefault Apr 17, 2023, 12:47 PM Generated from openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-64888d545-cg8k8_649b40fa-6144-45cd-824b-9a5336059345 14 times in the last 44 minutes failed to provision volume with StorageClass "newstorageclass-one": rpc error: code = InvalidArgument desc = invalid encryption kms configuration: failed connecting to Vault: failed to get the authentication token: Error making API request. Namespace: admin URL: PUT https://vault-cluster.vault.2467e33a-73f9-408b-b9ff-b0476a654d30.aws.hashicorp.cloud:8200/v1/auth/kubernetes/login Code: 403. Errors: * permission denied Hello 🖖 Based on the operator log provided, it appears that the issue is related to the retrieval of the vault token for the specified authentication method. Error source: - https://github.com/libopenstorage/secrets/blob/1022cc4d5aeb8bceedfc664b32667755b35e6a15/vault/utils/utils.go#L159-L161 - https://github.com/libopenstorage/secrets/blob/1022cc4d5aeb8bceedfc664b32667755b35e6a15/vault/vault.go#L106-L110 To address this issue, I suggest verifying that the Vault service is operational and ensuring that the authentication configuration, Vault credentials used by the operator to connect to Vault are properly configured. To further investigate and resolve the issue, it would be helpful to review the must-gather logs, including the NooBaa CR YAML. Can you please provide these logs? Thank you. Hello 🖖 Based on the information provided in comment #3 https://bugzilla.redhat.com/show_bug.cgi?id=2187197#c3, it seems that a similar error is also originating from ceph-csi rbd. Error source: - https://github.com/ceph/ceph-csi/blob/cd2e25c290a642154c25c4bf42e739f39c1d51bd/internal/rbd/encryption.go#L325-L327 - https://github.com/ceph/ceph-csi/blob/cd2e25c290a642154c25c4bf42e739f39c1d51bd/internal/kms/vault.go#L288-L291 - https://github.com/ceph/ceph-csi/blob/cd2e25c290a642154c25c4bf42e739f39c1d51bd/vendor/github.com/libopenstorage/secrets/vault/vault.go#L96-L101 - https://github.com/ceph/ceph-csi/blob/cd2e25c290a642154c25c4bf42e739f39c1d51bd/vendor/github.com/hashicorp/vault/api/response.go#L118-L124 This suggests that the provided KMS configuration may have issues since ceph-csi rbd is also encountering problems communicating with Vault and receiving "Code: 403. Errors: * permission denied" when attempting to log in. As a result, it seems likely that the root cause of the problem lies in the configuration of the Vault. Hi , I am not able to see private messages. please let me know the info required. I will be sharing you the cluster details over IM if you would like to access. tested the vault connection from a testpod and the bastion node: ################from node i get below output: [root@nara4-cicd-odf-ba4c-sao01-bastion-0 ~]# vault read auth/kubernetes/role/odf-rook-ceph-osd Key Value --- ----- alias_name_source serviceaccount_uid bound_service_account_names [rook-ceph-osd] bound_service_account_namespaces [openshift-storage] policies [odf] token_bound_cidrs [] token_explicit_max_ttl 0s token_max_ttl 0s token_no_default_policy false token_num_uses 0 token_period 0s token_policies [odf] token_ttl 1440h token_type default ttl 1440h #############From test pod # ./vault read auth/kubernetes/role/odf-rook-ceph-osd Key Value --- ----- alias_name_source serviceaccount_uid bound_service_account_names [rook-ceph-osd] bound_service_account_namespaces [openshift-storage] policies [odf] token_bound_cidrs [] token_explicit_max_ttl 0s token_max_ttl 0s token_no_default_policy false token_num_uses 0 token_period 0s token_policies [odf] token_ttl 1440h token_type default ttl 1440h Below are the config details and exact same configuration is working fine on Non-IBM setup(AWS).
[root@nara4-cicd-odf-ba4c-sao01-bastion-0 ~]# oc describe cm csi-kms-connection-details
Name: csi-kms-connection-details
Namespace: openshift-storage
Labels: <none>
Annotations: <none>
Data
====
Vault-test-1:
----
{"encryptionKMSType":"vaulttenantsa","kmsServiceName":"Vault-test-1","vaultAddress":"https://vault-cluster.vault.2467e33a-73f9-408b-b9ff-b0476a654d30.aws.hashicorp.cloud:8200","vaultBackendPath":"odf/","vaultTLSServerName":"","vaultCAFileName":"","vaultClientCertFileName":"","vaultClientCertKeyFileName":"","vaultAuthMethod":"kubernetes","vaultAuthPath":"/v1/auth/kubernetes/login","vaultAuthNamespace":"","vaultNamespace":"admin"}
BinaryData
====
Events: <none>
[root@nara4-cicd-odf-ba4c-sao01-bastion-0 ~]# oc describe cm ocs-kms-connection-details
Name: ocs-kms-connection-details
Namespace: openshift-storage
Labels: <none>
Annotations: <none>
Data
====
KMS_PROVIDER:
----
vault
KMS_SERVICE_NAME:
----
Vault-test-1
VAULT_AUTH_KUBERNETES_ROLE:
----
odf-rook-ceph-op
VAULT_AUTH_METHOD:
----
kubernetes
VAULT_AUTH_MOUNT_PATH:
----
/v1/auth/kubernetes/login
VAULT_ADDR:
----
https://vault-cluster.vault.2467e33a-73f9-408b-b9ff-b0476a654d30.aws.hashicorp.cloud:8200
VAULT_BACKEND_PATH:
----
odf/
VAULT_NAMESPACE:
----
admin
VAULT_TLS_SERVER_NAME:
----
BinaryData
====
Events: <none>
[root@nara4-cicd-odf-ba4c-sao01-bastion-0 ~]#
Hello @narayanspg 🖖 Based on the NooBaa operator log provided, it appears that the issue is related to the retrieval of the Vault token for the specified authentication method. Error source: - https://github.com/libopenstorage/secrets/blob/1022cc4d5aeb8bceedfc664b32667755b35e6a15/vault/utils/utils.go#L159-L161 - https://github.com/libopenstorage/secrets/blob/1022cc4d5aeb8bceedfc664b32667755b35e6a15/vault/vault.go#L106-L110 Based on the information provided in comment #3 https://bugzilla.redhat.com/show_bug.cgi?id=2187197#c3, it seems that a similar error is also originating from ceph-csi rbd. Error source: - https://github.com/ceph/ceph-csi/blob/cd2e25c290a642154c25c4bf42e739f39c1d51bd/internal/rbd/encryption.go#L325-L327 - https://github.com/ceph/ceph-csi/blob/cd2e25c290a642154c25c4bf42e739f39c1d51bd/internal/kms/vault.go#L288-L291 - https://github.com/ceph/ceph-csi/blob/cd2e25c290a642154c25c4bf42e739f39c1d51bd/vendor/github.com/libopenstorage/secrets/vault/vault.go#L96-L101 - https://github.com/ceph/ceph-csi/blob/cd2e25c290a642154c25c4bf42e739f39c1d51bd/vendor/github.com/hashicorp/vault/api/response.go#L118-L124 This suggests that the provided KMS configuration may have issues since both NooBaa operator and ceph-csi rbd are encountering problems communicating with the Vault, ceph-csi rbd receiving "Code: 403. Errors: * permission denied" when attempting to log in. As a result, it seems likely that the root cause of the problem lies in the configuration of the Vault's credentials. To address this issue, I suggest verifying that the Vault credentials used by the operators (NooBaa and ceph-csi rbd) to connect to Vault are properly configured. To further investigate and resolve the issue, it would be helpful to review the must-gather logs, including the NooBaa CR YAML. Questions: - Can you please provide the additional? - Could you perform a test verifying you are able to communicate with the Valt using the NooBaa operator and ceph-csi rbd Vault credentials? Thank you. Hi Alexander, There is some problem in uploading mustgather with this account so shared over Box for temp access. you can access here - https://ibm.box.com/s/ddjwhvw705d5yf9lzbsntzhtbzuvns49 Below is the connection test on noobaa operator pod. [root@nara4-cicd-odf-ba4c-sao01-bastion-0 ~]# oc rsh noobaa-operator-76d488695b-wtv6r #exported variables sh-5.1$ ./vault read auth/kubernetes/role/odf-rook-ceph-osd Key Value --- ----- alias_name_source serviceaccount_uid bound_service_account_names [rook-ceph-osd] bound_service_account_namespaces [openshift-storage] policies [odf] token_bound_cidrs [] token_explicit_max_ttl 0s token_max_ttl 0s token_no_default_policy false token_num_uses 0 token_period 0s token_policies [odf] token_ttl 1440h token_type default ttl 1440h sh-5.1$ exit exit you can also access the cluster with below details. web_console_url = "https://console-openshift-console.apps.nara4-cicd-odf-ba4c.redhat.com" kubeadmin-password/Sm9Yd-3YJJY-CfyxU-ncY5r etc_hosts_entries = <<EOT 169.57.180.37 api.nara4-cicd-odf-ba4c.redhat.com console-openshift-console.apps.nara4-cicd-odf-ba4c.redhat.com integrated-oauth-server-openshift-authentication.apps.nara4-cicd-odf-ba4c.redhat.com oauth-openshift.apps.nara4-cicd-odf-ba4c.redhat.com prometheus-k8s-openshift-monitoring.apps.nara4-cicd-odf-ba4c.redhat.com grafana-openshift-monitoring.apps.nara4-cicd-odf-ba4c.redhat.com example.apps.nara4-cicd-odf-ba4c.redhat.com Hello @narayanspg 🖖,
Thank you for sharing the must-gather with me. Unfortunately, I encountered an error while accessing the web console cluster URL, even though I was connected to the RH VPN.
The NooBaa CR KMS declaration is:
```yaml
kms:
connectionDetails:
KMS_PROVIDER: vault
KMS_SERVICE_NAME: Vault-test-1
VAULT_ADDR: https://vault-cluster.vault.2467e33a-73f9-408b-b9ff-b0476a654d30.aws.hashicorp.cloud:8200
VAULT_AUTH_KUBERNETES_ROLE: odf-rook-ceph-op
VAULT_AUTH_METHOD: kubernetes
VAULT_AUTH_MOUNT_PATH: /v1/auth/kubernetes/login
VAULT_BACKEND_PATH: odf/
VAULT_NAMESPACE: admin
VAULT_TLS_SERVER_NAME: ""
```
Upon examining the NooBaa CR KMS declaration you provided, it seems that the k8s service account authentication should be used. However, I could not find any information about "odf-rook-ceph-op" in the must-gather. Could you please provide more details about the service account "odf-rook-ceph-op" and its token secret? You can use the following commands to retrieve the information:
kubectl -n <NS> get sa odf-rook-ceph-op -o yaml
kubectl -n <NS> get secret <SA-TOKEN> -o yaml
Also, regarding the VAULT_AUTH_MOUNT_PATH configuration parameter, I'm not entirely sure about its usage. Is there any specific reason for defining and how its value was calculated? Does it mean that the service account token gets mounted in a non-default path? According to a ceph-csi PR (https://github.com/ceph/ceph-csi/pull/2322), it seems that this value might have been mistakenly taken from the default auth path, note no "mount" component in the name. Therefore, I suggest removing the VAULT_AUTH_MOUNT_PATH variable and relying on the library's default value unless there is a good reason to use it.
Could you please try again with that variable removed?
Thank you for your help!
Hi Alexander,
We tried recreate the env newly and this time we create storagesystem without encryption and storagecluster got to ready state.
create new storageclass , enabled encryption and created PVC. getting same error.
Below are the details requested.
[root@nara6-cicd-odf-e189-sao01-bastion-0 vault]# vault read auth/kubernetes/role/odf-rook-ceph-op
Key Value
--- -----
alias_name_source serviceaccount_uid
bound_service_account_names [rook-ceph-system rook-ceph-osd noobaa]
bound_service_account_namespaces [openshift-storage]
policies [odf]
token_bound_cidrs []
token_explicit_max_ttl 0s
token_max_ttl 0s
token_no_default_policy false
token_num_uses 0
token_period 0s
token_policies [odf]
token_ttl 1440h
token_type default
ttl 1440h
oc get secret odf-vault-auth-token -o yaml
kind: Secret
metadata:
annotations:
kubernetes.io/service-account.name: odf-vault-auth
kubernetes.io/service-account.uid: 7dae76ea-5aef-46a6-b8f7-b698b16d896b
creationTimestamp: "2023-04-20T14:15:33Z"
name: odf-vault-auth-token
namespace: openshift-storage
resourceVersion: "469116"
uid: 122964b6-3bc9-45fd-9a7e-8bbcd47cd48a
type: kubernetes.io/service-account-token
Below are the cluster details :
web_console_url = "https://console-openshift-console.apps.nara6-cicd-odf-e189.redhat.com"
kubeadmin-password/iv4ZG-KbFDS-DvXoN-7DieY
etc_hosts_entries = <<EOT
169.57.180.34 api.nara6-cicd-odf-e189.redhat.com console-openshift-console.apps.nara6-cicd-odf-e189.redhat.com integrated-oauth-server-openshift-authentication.apps.nara6-cicd-odf-e189.redhat.com oauth-openshift.apps.nara6-cicd-odf-e189.redhat.com prometheus-k8s-openshift-monitoring.apps.nara6-cicd-odf-e189.redhat.com grafana-openshift-monitoring.apps.nara6-cicd-odf-e189.redhat.com example.apps.nara6-cicd-odf-e189.redhat.com
with same configuration and vault instance server its working on non-IBM cluster.
Thanks,
Narayan
Rakshith/Malay, please take a look in parallel to Noobaa. hey, please link the mustgather (the temp link does not have the folder anymore) Did this encryption work with ODF 4.12 ? Can you share link to the documentation you are following to set this up ? Hi Rakshith, I have shared the must-gather here - https://ibm.box.com/s/mcevrb1qnny73die4gbaixh5nt2fs0r1 haven't performed encryption test with ODF 4.12 Below are the documents shared by Parag and same steps are working fine in non-IBM environment. https://docs.google.com/document/d/1g8dP3ba8wb5Ruvtnjnvm3w593iQLr_XTWdj2LZ5JmFY/edit#heading=h.p9mwy576e85q https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.12/html/managing_and_allocating_storage_resources/storage-classes_rhodf#configuring-access-to-kms-using-vaulttenantsa_rhodf The exact problem is vault needs to talk to OCP cluster to verify sa token but the `OCP_HOST` url provided is not reachable from vault server. >https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.12/html/managing_and_allocating_storage_resources/storage-classes_rhodf#configuring-access-to-kms-using-vaulttenantsa_rhodf > Step 4: Retrieve the OpenShift cluster endpoint. ``` $ OCP_HOST=$(oc config view --minify --flatten -o jsonpath="{.clusters[0].cluster.server}") ``` Following the steps, I get the following as the endpoint ``` [rakshith@fedora ~]$ OCP_HOST=$(oc config view --minify --flatten -o jsonpath="{.clusters[0].cluster.server}") [rakshith@fedora ~]$ echo $OCP_HOST https://api.nara6-cicd-odf-e189.redhat.com:6443 ``` But this is not reachable from vault. We had to add the domain entries manually on our laptops to access the cluster ``` etc_hosts_entries = <<EOT 169.57.180.34 api.nara6-cicd-odf-e189.redhat.com console-openshift-console.apps.nara6-cicd-odf-e189.redhat.com integrated-oauth-server-openshift-authentication.apps.nara6-cicd-odf-e189.redhat.com oauth-openshift.apps.nara6-cicd-odf-e189.redhat.com prometheus-k8s-openshift-monitoring.apps.nara6-cicd-odf-e189.redhat.com grafana-openshift-monitoring.apps.nara6-cicd-odf-e189.redhat.com example.apps.nara6-cicd-odf-e189.redhat.com ``` You have to find a way to add these entries to vault server or change the endpoint url to properly point to the cluster. Hi Rakshith, comment #11 and #13 we have tried to test connectivity with Vault instance we are using. same are the results. on Non IBM environment there were no host entries added. We are using enterprise vault which is hosted service by hachi corp so we cant add host entry there. (In reply to narayanspg from comment #18) > Hi Rakshith, > > comment #11 and #13 we have tried to test connectivity with Vault instance > we are using. same are the results. on Non IBM environment there were no > host entries added. Its about vault being able to talk to cluster. I know the other way around works. Try with OCP_HOST=https://169.57.180.34:6443 tried with with OCP_HOST=https://169.57.180.34:6443 but didn't work. PVC creation failed. Vsphere enviornment as well seen with same issue. (In reply to Rakshith from comment #19) > (In reply to narayanspg from comment #18) > > Hi Rakshith, > > > > comment #11 and #13 we have tried to test connectivity with Vault instance > > we are using. same are the results. on Non IBM environment there were no > > host entries added. > > Its about vault being able to talk to cluster. > I know the other way around works. > > Try with OCP_HOST=https://169.57.180.34:6443 (In reply to narayanspg from comment #20) > tried with with OCP_HOST=https://169.57.180.34:6443 but didn't work. PVC > creation failed. > > Vsphere enviornment as well seen with same issue. Closing this BZ, since this is not a product bug and the feature works with aws with new hp vault service, and vsphere and other clusters within vpn when self hosted community vault service was used. QE needs to create a cluster with visible public endpoint for vault to verify sa token or figure out a way for vault to be able to talk with ocp cluster. Please reopen the bug with necessary justification. Thanks, tried with community vault instance. after adding host entries on the vault server the issue is not seen. |
Description of problem (please be detailed as possible and provide log snippests): Noobaa pods are not coming up after enabling Ceph storageclass encryption Due to this Storagecluster is not getting to ready state with below info. Waiting on Nooba instance to finish initialization noobaa operator logs: time="2023-04-17T07:11:58Z" level=error msg="ReconcileRootSecret, NewKMS error failed to get the authentication token: authentication returned nil auth info" sys=openshift-storage/noobaa time="2023-04-17T07:11:58Z" level=info msg="setKMSConditionStatus Invalid" sys=openshift-storage/noobaa time="2023-04-17T07:11:58Z" level=info msg="SetPhase: temporary error during phase \"Creating\"" sys=openshift-storage/noobaa time="2023-04-17T07:11:58Z" level=warning msg="⏳ Temporary Error: failed to get the authentication token: authentication returned nil auth info" sys=openshift-storage/noobaa Version of all relevant components (if applicable): [root@nara4-cicd-odf-ba4c-sao01-bastion-0 ~]# oc describe csv odf-operator.v4.13.0 -n openshift-storage | grep full Labels: full_version=4.13.0-165 f:full_version: [root@nara4-cicd-odf-ba4c-sao01-bastion-0 ~]# oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.13.0-0.nightly-ppc64le-2023-02-17-084453 True False 4h24m Cluster version is 4.13.0-0.nightly-ppc64le-2023-02-17-084453 [root@nara4-cicd-odf-ba4c-sao01-bastion-0 ~]# Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Not able to progress on PV encryption feature. Is there any workaround available to the best of your knowledge? No Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 3 Can this issue reproducible? Yes Can this issue reproduce from the UI? Yes If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Create OCP and install ODF 2. enable storageclass encryption during storagesystem creation 3. storagecluster will be stuck in progressing state due to noobaa pods not coming up. Actual results: storagecluster stuck in progressing state Expected results: storagecluster should get ready state Additional info: attaching noobaa operator logs and mustgather.