Description of problem (please be detailed as possible and provide a log snippets): We have ROSA provider cluster with ODF 4.10.0-122. The ceph-external-cluster-details-exporter1.py failed with an error Error: key for client. csi-rbd-node exists but cap mon does not match Version of all relevant components (if applicable): OpenShift version: 4.9.15 ceph version 16.2.7-35.el8cp (51d904cb9b9eb82f2c11b4cf5252ab3f3ff0d6b4) pacific (stable) OCS - 4.10.0-122 Does this issue impact your ability to continue to work with the product? (please explain in detail what is the user impact)? yes Is there any workaround available to the best of your knowledge? 1. Delete the csi client and the script will create it. or The script is buggy at line 862,863,873,874. get the expected value from the command on your cluster and hardcode this in the script. -----------Error prone Code line--------------- 862 #self.out_map['CSI_RBD_NODE_SECRET_SECRET'] = self.create_cephCSIKeyring_RBDNode() 864 #self.out_map['CSI_RBD_PROVISIONER_SECRET'] = self.create_cephCSIKeyring_RBDProvisioner() ... 872 if self.out_map['CEPHFS_FS_NAME'] and self.out_map['CEPHFS_POOL_NAME']: 873 #self.out_map['CSI_CEPHFS_NODE_SECRET'] = self.create_cephCSIKeyring_cephFSNode() 874 #self.out_map['CSI_CEPHFS_PROVISIONER_SECRET'] = self.create_cephCSIKeyring_cephFSProvisioner() -----------------Get Values from ceph command for your cluster example below -------- sh-4.4$ ceph auth get-or-create client.csi-rbd-node [client.csi-rbd-node] key = AQDlo/ZhnzIjAxAAszmP7FdCQdD+0GlSCO5A4A== sh-4.4$ ceph auth get-or-create client.csi-rbd-provisioner [client.csi-rbd-provisioner] key = AQDko/ZhN9y2MRAAyDvTG01MzfY0JaLu7YihJA== sh-4.4$ ceph auth get-or-create client.csi-cephfs-node [client.csi-cephfs-node] key = AQDlo/ZhNdDQHBAARJv2OfopA6hioIbDrJLcUA== sh-4.4$ ceph auth get-or-create client.csi-cephfs-provisioner [client.csi-cephfs-provisioner] key = AQDlo/ZhkJP2DxAAqcP7mm/5dU4+hFlzRgYGHQ== -------------------Replace this values in Code------------------ 862 self.out_map['CSI_RBD_NODE_SECRET_SECRET'] = 'AQDlo/ZhnzIjAxAAszmP7FdCQdD+0GlSCO5A4A==' 863 self.out_map['CSI_RBD_PROVISIONER_SECRET'] = 'AQDko/ZhN9y2MRAAyDvTG01MzfY0JaLu7YihJA==' 872 if self.out_map['CEPHFS_FS_NAME'] and self.out_map['CEPHFS_POOL_NAME']: 873 self.out_map['CSI_CEPHFS_NODE_SECRET'] = 'AQDlo/ZhNdDQHBAARJv2OfopA6hioIbDrJLcUA==' 874 self.out_map['CSI_CEPHFS_PROVISIONER_SECRET'] = 'AQDlo/ZhkJP2DxAAqcP7mm/5dU4+hFlzRgYGHQ==' ========================================================================= Now execute the script Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 3 Can this issue be reproducible? 2/2 Can this issue reproduce from the UI? NA If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Install ODF 4.10 2. Download the ceph-external-cluster-details-exporter.py usinf command: oc get csv $(oc get csv -n openshift-storage | grep ocs-operator | awk '{print $1}') -n openshift-storage -o jsonpath='{.metadata.annotations.external\.features\.ocs\.openshift\.io/export-script}' | base64 --decode > ceph-external-cluster-details-exporter.py 3.Copy the python script to tool box oc cp ceph-external-cluster-details-exporter.py <tool-box-pod-name>:/ 4. Execute the script into toolbox to get the json blob oc exec -n openshift-storage <tool-box-pod-name> -- python3 ceph-external-cluster-details-exporter.py --rbd-data-pool-name replicapool1 > ceph_credentials.json Actual results: execution Error: --rbd-data-pool-name replicapool > ceph_credentials.json Traceback (most recent call last): File "create-external-cluster-resources.py", line 1191, in <module> raise err File "create-external-cluster-resources.py", line 1188, in <module> rjObj.main() File "create-external-cluster-resources.py", line 1169, in main generated_output = self.gen_json_out() File "create-external-cluster-resources.py", line 898, in gen_json_out self._gen_output_map() File "create-external-cluster-resources.py", line 862, in _gen_output_map self.out_map['CSI_RBD_NODE_SECRET_SECRET'] = self.create_cephCSIKeyring_RBDNode() File "create-external-cluster-resources.py", line 755, in create_cephCSIKeyring_RBDNode "Error: {}".format(err_msg if ret_val != 0 else self.EMPTY_OUTPUT_LIST)) __main__.ExecutionFailureException: 'auth get-or-create client.csi-rbd-node' command failed Error: key for client.csi-rbd-node exists but cap mon does not match Expected results: JSON blob Additional info:
Hi, We'll always get these errors when the ceph users are already present and we try to create the same user with different caps. Also, IIRC, here you are trying to run the script in the toolbox where we already have the ODF cluster and storage cluster. So, this will through errors and in the normal cluster(non-external cluster) rook creates these users with different caps which not expected as if I'm not wrong the aim of the python script is to run in RHCS cluster where we don't expect these ceph users to already present. The current workaround is to delete the Ceph users and let the script create them. These are what we discussed in the call we had today morning. Please correct me or add something if I missed it. According to me, this can be a feature request/RFE where we want the script to run in the toolbox also. Or, to handle a case where these users are already present.
I am working on an upgrade --flag functionality BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2044983 PR: https://github.com/rook/rook/pull/9609 for the rhcs cluster, for clusters those who will upgrade from 4.9 to 4.10. I can also have an auto-upgrade-users functionality so users created in odf by rook can also take upgrade advantage and can auto-upgraded with new permissions. And you can also use a workaround for: Deleting the already created users
Part of: https://github.com/rook/rook/pull/9703
Parth, you need to cherry-pick the changes to https://github.com/red-hat-storage/rook/tree/release-4.10 before moving the BZ to MODIFIED
> Neha, the problem is that we use `get-or-create` cmd for creating users, so I will be separating it and will be doing the get and create process separately for the users, As this will also benefit if we have any future updates. We also need backports for these changes to older versions 4.9,4.8,.. till we support rhcs so, creating clone BZs for that. We need it because checking users if already exist is lacking in previous versions.
Please add doc text
Update: ======== Below scenarios are tested 1. Direct deployment of ODF 4.10 ( external cluster - doesn't have any csi users ) deploy job successful: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/3585/console verified csi users are created with new caps ( osd blocklist ) client.csi-cephfs-node key: AQA7kjliygK0MBAAVSOb/P4oBlHWiHKwxCjYGw== caps: [mds] allow rw caps: [mgr] allow rw caps: [mon] allow r, allow command 'osd blocklist' caps: [osd] allow rw tag cephfs *=* client.csi-cephfs-provisioner key: AQA7kjli5yfpMBAA9kgjzsY5RSYC86U+b96F6g== caps: [mgr] allow rw caps: [mon] allow r, allow command 'osd blocklist' caps: [osd] allow rw tag cephfs metadata=* client.csi-rbd-node key: AQA7kjli0bQ5MBAABR7f3d+9pEDRNuXzKC9nOA== caps: [mon] profile rbd, allow command 'osd blocklist' caps: [osd] profile rbd client.csi-rbd-provisioner key: AQBGkjliPXb5BRAAYoqHF5MY70mqsTT9eYYy7A== caps: [mgr] allow rw caps: [mon] profile rbd, allow command 'osd blocklist' caps: [osd] profile rbd client.healthchecker 2. External cluster has csi users from 4.9 and then deployment of ODF 4.10 deploy job successful: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/3587/console verified same caps exists for csi users after ODF 4.10 deployment client.csi-cephfs-node key: AQCzjzliDV7+KRAAPB8SGR2gD8cchK+iUeIgzw== caps: [mds] allow rw caps: [mgr] allow rw caps: [mon] allow r caps: [osd] allow rw tag cephfs *=* client.csi-cephfs-provisioner key: AQCzjzliRLYuKhAA7ek/Pjuct8tUxpFzm4DZIw== caps: [mgr] allow rw caps: [mon] allow r caps: [osd] allow rw tag cephfs metadata=* client.csi-rbd-node key: AQCzjzlir1SSKRAAPN5RnhFIXPCp3oeH12h56Q== caps: [mon] profile rbd caps: [osd] profile rbd client.csi-rbd-provisioner key: AQCzjzli2mTMKRAAk8oNubwgKGqdzm2UE5ZqZw== caps: [mgr] allow rw caps: [mon] profile rbd caps: [osd] profile rbd
upgrade from ocs-registry:4.9.5-4 to ocs-registry:4.10.0-210 is successful Before starting of upgrade, below are csi users client.csi-cephfs-node key: AQBNGUVi89jxDxAAaN016yQ1fadOm6YsA5saGA== caps: [mds] allow rw caps: [mgr] allow rw caps: [mon] allow r caps: [osd] allow rw tag cephfs *=* client.csi-cephfs-provisioner key: AQBNGUVigFgfExAAYxQw0DIf+tHwcrKQ3uJi8A== caps: [mgr] allow rw caps: [mon] allow r caps: [osd] allow rw tag cephfs metadata=* client.csi-rbd-node key: AQBNGUVieixTCxAAZckCVach2PSE33OlGrmb+Q== caps: [mon] profile rbd caps: [osd] profile rbd client.csi-rbd-provisioner key: AQBNGUViV3RbDRAA18MO0FlHw+xExiVpvQUEuw== caps: [mgr] allow rw caps: [mon] profile rbd caps: [osd] profile rbd after upgrade ( python /tmp/external-cluster-details-exporter-hfwbxk23.py --upgrade ) client.csi-cephfs-node key: AQBNGUVi89jxDxAAaN016yQ1fadOm6YsA5saGA== caps: [mds] allow rw caps: [mgr] allow rw caps: [mon] allow r, allow command 'osd blocklist' caps: [osd] allow rw tag cephfs *=* client.csi-cephfs-provisioner key: AQBNGUVigFgfExAAYxQw0DIf+tHwcrKQ3uJi8A== caps: [mgr] allow rw caps: [mon] allow r, allow command 'osd blocklist' caps: [osd] allow rw tag cephfs metadata=* client.csi-rbd-node key: AQBNGUVieixTCxAAZckCVach2PSE33OlGrmb+Q== caps: [mon] profile rbd, allow command 'osd blocklist' caps: [osd] profile rbd client.csi-rbd-provisioner key: AQBNGUViV3RbDRAA18MO0FlHw+xExiVpvQUEuw== caps: [mgr] allow rw caps: [mon] profile rbd, allow command 'osd blocklist' caps: [osd] profile rbd Marking as verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.10.0 enhancement, security & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:1372