Bug 2048458 - python exporter script 'ceph-external-cluster-details-exporter.py' error cap mon does not match on ODF 4.10
Summary: python exporter script 'ceph-external-cluster-details-exporter.py' error cap ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: rook
Version: 4.10
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ODF 4.10.0
Assignee: Parth Arora
QA Contact: Vijay Avuthu
URL:
Whiteboard:
Depends On:
Blocks: 2052607 2052608
TreeView+ depends on / blocked
 
Reported: 2022-01-31 09:58 UTC by suchita
Modified: 2023-08-09 17:03 UTC (History)
13 users (show)

Fixed In Version: 4.10.0-163
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 2052607 2052608 (view as bug list)
Environment:
Last Closed: 2022-04-13 18:52:41 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage rook pull 342 0 None open Bug 2048458: csi: make create and get process separate for csi users 2022-02-09 13:26:42 UTC
Github rook rook pull 9703 0 None Merged csi: make create and get process separate for csi users 2022-02-08 11:45:28 UTC
Red Hat Product Errata RHSA-2022:1372 0 None None None 2022-04-13 18:52:59 UTC

Description suchita 2022-01-31 09:58:07 UTC
Description of problem (please be detailed as possible and provide a log
snippets):
We have ROSA provider cluster with ODF 4.10.0-122. 
The ceph-external-cluster-details-exporter1.py failed with an error 
Error: key for client. csi-rbd-node exists but cap mon does not match


Version of all relevant components (if applicable):
OpenShift version:    4.9.15
ceph version 16.2.7-35.el8cp (51d904cb9b9eb82f2c11b4cf5252ab3f3ff0d6b4) pacific (stable)
OCS - 4.10.0-122




Does this issue impact your ability to continue to work with the product?
(please explain in detail what is the user impact)?
yes


Is there any workaround available to the best of your knowledge?
1. Delete the csi client and the script will create it. 
or 
The script is buggy at line 862,863,873,874. get the expected value from the command on your cluster and hardcode this in the script. 
-----------Error prone Code line---------------
862         #self.out_map['CSI_RBD_NODE_SECRET_SECRET'] = self.create_cephCSIKeyring_RBDNode()
864         #self.out_map['CSI_RBD_PROVISIONER_SECRET'] = self.create_cephCSIKeyring_RBDProvisioner()
...
 872         if self.out_map['CEPHFS_FS_NAME'] and self.out_map['CEPHFS_POOL_NAME']:
    873             #self.out_map['CSI_CEPHFS_NODE_SECRET'] = self.create_cephCSIKeyring_cephFSNode()
    874             #self.out_map['CSI_CEPHFS_PROVISIONER_SECRET'] = self.create_cephCSIKeyring_cephFSProvisioner()
-----------------Get Values from ceph command for your cluster example below --------
sh-4.4$ ceph auth get-or-create client.csi-rbd-node
[client.csi-rbd-node]
	key = AQDlo/ZhnzIjAxAAszmP7FdCQdD+0GlSCO5A4A==
sh-4.4$ ceph auth get-or-create client.csi-rbd-provisioner
[client.csi-rbd-provisioner]
	key = AQDko/ZhN9y2MRAAyDvTG01MzfY0JaLu7YihJA==
sh-4.4$ ceph auth get-or-create client.csi-cephfs-node
[client.csi-cephfs-node]
	key = AQDlo/ZhNdDQHBAARJv2OfopA6hioIbDrJLcUA==
sh-4.4$ ceph auth get-or-create client.csi-cephfs-provisioner
[client.csi-cephfs-provisioner]
	key = AQDlo/ZhkJP2DxAAqcP7mm/5dU4+hFlzRgYGHQ==
-------------------Replace this values in Code------------------
 862         self.out_map['CSI_RBD_NODE_SECRET_SECRET'] = 'AQDlo/ZhnzIjAxAAszmP7FdCQdD+0GlSCO5A4A=='
863         self.out_map['CSI_RBD_PROVISIONER_SECRET'] = 'AQDko/ZhN9y2MRAAyDvTG01MzfY0JaLu7YihJA=='

 872       if self.out_map['CEPHFS_FS_NAME'] and self.out_map['CEPHFS_POOL_NAME']:
 873             self.out_map['CSI_CEPHFS_NODE_SECRET'] = 'AQDlo/ZhNdDQHBAARJv2OfopA6hioIbDrJLcUA=='
 874             self.out_map['CSI_CEPHFS_PROVISIONER_SECRET'] = 'AQDlo/ZhkJP2DxAAqcP7mm/5dU4+hFlzRgYGHQ=='
========================================================================= 
Now execute the script



Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
3

Can this issue be reproducible?
2/2

Can this issue reproduce from the UI?
NA

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Install ODF 4.10 
2. Download the ceph-external-cluster-details-exporter.py usinf command:
oc get csv $(oc get csv -n openshift-storage | grep ocs-operator | awk '{print $1}') -n openshift-storage -o jsonpath='{.metadata.annotations.external\.features\.ocs\.openshift\.io/export-script}' | base64 --decode > ceph-external-cluster-details-exporter.py
3.Copy the python script to tool box
oc cp ceph-external-cluster-details-exporter.py <tool-box-pod-name>:/
4. Execute the script into toolbox to get the json blob
oc exec -n openshift-storage <tool-box-pod-name> -- python3 ceph-external-cluster-details-exporter.py --rbd-data-pool-name replicapool1 > ceph_credentials.json



Actual results:
execution Error: 
 --rbd-data-pool-name replicapool > ceph_credentials.json
Traceback (most recent call last):
  File "create-external-cluster-resources.py", line 1191, in <module>
    raise err
  File "create-external-cluster-resources.py", line 1188, in <module>
    rjObj.main()
  File "create-external-cluster-resources.py", line 1169, in main
    generated_output = self.gen_json_out()
  File "create-external-cluster-resources.py", line 898, in gen_json_out
    self._gen_output_map()
  File "create-external-cluster-resources.py", line 862, in _gen_output_map
    self.out_map['CSI_RBD_NODE_SECRET_SECRET'] = self.create_cephCSIKeyring_RBDNode()
  File "create-external-cluster-resources.py", line 755, in create_cephCSIKeyring_RBDNode
    "Error: {}".format(err_msg if ret_val != 0 else self.EMPTY_OUTPUT_LIST))
__main__.ExecutionFailureException: 'auth get-or-create client.csi-rbd-node' command failed
Error: key for client.csi-rbd-node exists but cap mon does not match

Expected results:
JSON blob

Additional info:

Comment 2 Subham Rai 2022-01-31 12:03:05 UTC
Hi, 

We'll always get these errors when the ceph users are already present and we try to create the same user with different caps.

Also, IIRC, here you are trying to run the script in the toolbox where we already have the ODF cluster and storage cluster. So, this will through errors and in the normal cluster(non-external cluster) rook creates these users with different caps which not expected as if I'm not wrong the aim of the python script is to run in RHCS cluster where we don't expect these ceph users to already present.

The current workaround is to delete the Ceph users and let the script create them.

These are what we discussed in the call we had today morning. Please correct me or add something if I missed it. 


According to me, this can be a feature request/RFE where we want the script to run in the toolbox also. Or, to handle a case where these users are already present.

Comment 3 Parth Arora 2022-01-31 13:58:32 UTC
I am working on an upgrade --flag functionality BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2044983 PR: https://github.com/rook/rook/pull/9609 for the rhcs cluster, for clusters those who will upgrade from 4.9 to 4.10.

I can also have an auto-upgrade-users functionality so users created in odf by rook can also take upgrade advantage and can auto-upgraded with new permissions.

And you can also use a workaround for: Deleting the already created users

Comment 8 Parth Arora 2022-02-04 12:42:12 UTC
Part of: https://github.com/rook/rook/pull/9703

Comment 9 Mudit Agarwal 2022-02-08 11:45:29 UTC
Parth, you need to cherry-pick the changes to https://github.com/red-hat-storage/rook/tree/release-4.10 before moving the BZ to MODIFIED

Comment 12 Parth Arora 2022-02-09 16:21:26 UTC
> Neha, the problem is that we use `get-or-create` cmd for creating users, so I will be separating it and will be doing the get and create process separately for the users, As this will also benefit if we have any future updates.

We also need backports for these changes to older versions 4.9,4.8,.. till we support rhcs so, creating clone BZs for that.
We need it because checking users if already exist is lacking in previous versions.

Comment 16 Mudit Agarwal 2022-03-03 10:00:38 UTC
Please add doc text

Comment 18 Vijay Avuthu 2022-03-22 09:13:03 UTC
Update:
========

Below scenarios are tested

1. Direct deployment of ODF 4.10 ( external cluster - doesn't have any csi users )

    deploy job successful: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/3585/console

    verified csi users are created with new caps ( osd blocklist )

   client.csi-cephfs-node
	key: AQA7kjliygK0MBAAVSOb/P4oBlHWiHKwxCjYGw==
	caps: [mds] allow rw
	caps: [mgr] allow rw
	caps: [mon] allow r, allow command 'osd blocklist'
	caps: [osd] allow rw tag cephfs *=*
client.csi-cephfs-provisioner
	key: AQA7kjli5yfpMBAA9kgjzsY5RSYC86U+b96F6g==
	caps: [mgr] allow rw
	caps: [mon] allow r, allow command 'osd blocklist'
	caps: [osd] allow rw tag cephfs metadata=*
client.csi-rbd-node
	key: AQA7kjli0bQ5MBAABR7f3d+9pEDRNuXzKC9nOA==
	caps: [mon] profile rbd, allow command 'osd blocklist'
	caps: [osd] profile rbd
client.csi-rbd-provisioner
	key: AQBGkjliPXb5BRAAYoqHF5MY70mqsTT9eYYy7A==
	caps: [mgr] allow rw
	caps: [mon] profile rbd, allow command 'osd blocklist'
	caps: [osd] profile rbd
client.healthchecker


2. External cluster has csi users from 4.9 and then deployment of ODF 4.10 
 
   deploy job successful: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/3587/console

   verified same caps exists for csi users after ODF 4.10 deployment

   client.csi-cephfs-node
	key: AQCzjzliDV7+KRAAPB8SGR2gD8cchK+iUeIgzw==
	caps: [mds] allow rw
	caps: [mgr] allow rw
	caps: [mon] allow r
	caps: [osd] allow rw tag cephfs *=*
client.csi-cephfs-provisioner
	key: AQCzjzliRLYuKhAA7ek/Pjuct8tUxpFzm4DZIw==
	caps: [mgr] allow rw
	caps: [mon] allow r
	caps: [osd] allow rw tag cephfs metadata=*
client.csi-rbd-node
	key: AQCzjzlir1SSKRAAPN5RnhFIXPCp3oeH12h56Q==
	caps: [mon] profile rbd
	caps: [osd] profile rbd
client.csi-rbd-provisioner
	key: AQCzjzli2mTMKRAAk8oNubwgKGqdzm2UE5ZqZw==
	caps: [mgr] allow rw
	caps: [mon] profile rbd
	caps: [osd] profile rbd

Comment 19 Vijay Avuthu 2022-03-31 04:25:03 UTC
upgrade from ocs-registry:4.9.5-4 to ocs-registry:4.10.0-210 is successful

Before starting of upgrade, below are csi users

client.csi-cephfs-node
	key: AQBNGUVi89jxDxAAaN016yQ1fadOm6YsA5saGA==
	caps: [mds] allow rw
	caps: [mgr] allow rw
	caps: [mon] allow r
	caps: [osd] allow rw tag cephfs *=*
client.csi-cephfs-provisioner
	key: AQBNGUVigFgfExAAYxQw0DIf+tHwcrKQ3uJi8A==
	caps: [mgr] allow rw
	caps: [mon] allow r
	caps: [osd] allow rw tag cephfs metadata=*
client.csi-rbd-node
	key: AQBNGUVieixTCxAAZckCVach2PSE33OlGrmb+Q==
	caps: [mon] profile rbd
	caps: [osd] profile rbd
client.csi-rbd-provisioner
	key: AQBNGUViV3RbDRAA18MO0FlHw+xExiVpvQUEuw==
	caps: [mgr] allow rw
	caps: [mon] profile rbd
	caps: [osd] profile rbd

after upgrade ( python /tmp/external-cluster-details-exporter-hfwbxk23.py --upgrade )

client.csi-cephfs-node
	key: AQBNGUVi89jxDxAAaN016yQ1fadOm6YsA5saGA==
	caps: [mds] allow rw
	caps: [mgr] allow rw
	caps: [mon] allow r, allow command 'osd blocklist'
	caps: [osd] allow rw tag cephfs *=*
client.csi-cephfs-provisioner
	key: AQBNGUVigFgfExAAYxQw0DIf+tHwcrKQ3uJi8A==
	caps: [mgr] allow rw
	caps: [mon] allow r, allow command 'osd blocklist'
	caps: [osd] allow rw tag cephfs metadata=*
client.csi-rbd-node
	key: AQBNGUVieixTCxAAZckCVach2PSE33OlGrmb+Q==
	caps: [mon] profile rbd, allow command 'osd blocklist'
	caps: [osd] profile rbd
client.csi-rbd-provisioner
	key: AQBNGUViV3RbDRAA18MO0FlHw+xExiVpvQUEuw==
	caps: [mgr] allow rw
	caps: [mon] profile rbd, allow command 'osd blocklist'
	caps: [osd] profile rbd

Marking as verified

Comment 21 errata-xmlrpc 2022-04-13 18:52:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.10.0 enhancement, security & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1372


Note You need to log in before you can comment on or make changes to this bug.