Description of problem: Clone operations are failing with Assertion Error. When we create more clone in my case i have created 130[root@ceph-amk-bz-2-qa3ps0-node7 _nogroup]# ceph fs clone status cephfs clone_status_142 Error EINVAL: Traceback (most recent call last): File "/usr/share/ceph/mgr/volumes/fs/operations/versions/__init__.py", line 96, in get_subvolume_object self.upgrade_to_v2_subvolume(subvolume) File "/usr/share/ceph/mgr/volumes/fs/operations/versions/__init__.py", line 57, in upgrade_to_v2_subvolume version = int(subvolume.metadata_mgr.get_global_option('version')) File "/usr/share/ceph/mgr/volumes/fs/operations/versions/metadata_manager.py", line 144, in get_global_option return self.get_option(MetadataManager.GLOBAL_SECTION, key) File "/usr/share/ceph/mgr/volumes/fs/operations/versions/metadata_manager.py", line 138, in get_option raise MetadataMgrException(-errno.ENOENT, "section '{0}' does not exist".format(section)) volumes.fs.exception.MetadataMgrException: -2 (section 'GLOBAL' does not exist) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/share/ceph/mgr/mgr_module.py", line 1446, in _handle_command return self.handle_command(inbuf, cmd) File "/usr/share/ceph/mgr/volumes/module.py", line 437, in handle_command return handler(inbuf, cmd) File "/usr/share/ceph/mgr/volumes/module.py", line 34, in wrap return f(self, inbuf, cmd) File "/usr/share/ceph/mgr/volumes/module.py", line 682, in _cmd_fs_clone_status vol_name=cmd['vol_name'], clone_name=cmd['clone_name'], group_name=cmd.get('group_name', None)) File "/usr/share/ceph/mgr/volumes/fs/volume.py", line 622, in clone_status with open_subvol(self.mgr, fs_handle, self.volspec, group, clonename, SubvolumeOpType.CLONE_STATUS) as subvolume: File "/lib64/python3.6/contextlib.py", line 81, in __enter__ return next(self.gen) File "/usr/share/ceph/mgr/volumes/fs/operations/subvolume.py", line 72, in open_subvol subvolume = loaded_subvolumes.get_subvolume_object(mgr, fs, vol_spec, group, subvolname) File "/usr/share/ceph/mgr/volumes/fs/operations/versions/__init__.py", line 101, in get_subvolume_object self.upgrade_legacy_subvolume(fs, subvolume) File "/usr/share/ceph/mgr/volumes/fs/operations/versions/__init__.py", line 78, in upgrade_legacy_subvolume assert subvolume.legacy_mode AssertionError Version-Release number of selected component (if applicable): [root@ceph-amk-bz-1-wu0ar7-node7 ~]# ceph versions { "mon": { "ceph version 16.2.8-27.el8cp (b0bd3a6c6f24d3ac855dde96982871257bef866f) pacific (stable)": 3 }, "mgr": { "ceph version 16.2.8-27.el8cp (b0bd3a6c6f24d3ac855dde96982871257bef866f) pacific (stable)": 2 }, "osd": { "ceph version 16.2.8-27.el8cp (b0bd3a6c6f24d3ac855dde96982871257bef866f) pacific (stable)": 12 }, "mds": { "ceph version 16.2.8-27.el8cp (b0bd3a6c6f24d3ac855dde96982871257bef866f) pacific (stable)": 3 }, "overall": { "ceph version 16.2.8-27.el8cp (b0bd3a6c6f24d3ac855dde96982871257bef866f) pacific (stable)": 20 } } [root@ceph-amk-bz-1-wu0ar7-node7 ~]# How reproducible: 1/1 Steps to Reproduce: Create Subvolumegroup ceph fs subvolumegroup create cephfs subvolgroup_clone_status_1 Create Subvolume ceph fs subvolume create cephfs subvol_clone_status --size 5368706371 --group_name subvolgroup_clone_status_1 Kernel mount the volume and fill data Create Snapshot ceph fs subvolume snapshot create cephfs subvol_clone_status snap_1 --group_name subvolgroup_clone_status_1 Create 200 Clones out of the above subvolume ceph fs subvolume snapshot clone cephfs subvol_clone_status snap_1 clone_status_1 --group_name subvolgroup_clone_status_1 Actual results: Expected results: Should fail gracefully Additional info:
Tested till 122 clones and we are not seeing the issue [root@ceph-amk-bootstrap-clx3kj-node7 ~]# ceph fs clone status cephfs clone_status_122 { "status": { "state": "complete" } } [root@ceph-amk-bootstrap-clx3kj-node7 ~]# ceph fs subvolume snapshot info cephfs subvol_clone_status snap_1 --group_name subvolgroup_clone_status_1 { "created_at": "2023-01-23 08:28:43.445930", "data_pool": "cephfs.cephfs.data", "has_pending_clones": "no" } Cluster has reached full with all the clones [root@ceph-amk-bootstrap-clx3kj-node7 ~]# ceph -s cluster: id: fa611392-9af1-11ed-be7c-fa163e696b4f health: HEALTH_ERR 2 backfillfull osd(s) 1 full osd(s) 3 nearfull osd(s) Degraded data redundancy: 632/595707 objects degraded (0.106%), 81 pgs degraded Full OSDs blocking recovery: 81 pgs recovery_toofull 4 pool(s) full services: mon: 3 daemons, quorum ceph-amk-bootstrap-clx3kj-node1-installer,ceph-amk-bootstrap-clx3kj-node2,ceph-amk-bootstrap-clx3kj-node3 (age 42m) mgr: ceph-amk-bootstrap-clx3kj-node1-installer.aqnxiy(active, since 34h), standbys: ceph-amk-bootstrap-clx3kj-node2.glqgyi mds: 2/2 daemons up, 1 standby osd: 12 osds: 12 up (since 42m), 12 in (since 42m) data: volumes: 1/1 healthy pools: 4 pools, 321 pgs objects: 198.57k objects, 49 GiB usage: 150 GiB used, 30 GiB / 180 GiB avail pgs: 632/595707 objects degraded (0.106%) 240 active+clean 81 active+recovery_toofull+degraded io: client: 85 B/s rd, 341 B/s wr, 0 op/s rd, 0 op/s wr progress: Global Recovery Event (23h) [====================........] (remaining: 7h) Versions [root@ceph-amk-bootstrap-clx3kj-node7 ~]# ceph versions { "mon": { "ceph version 16.2.10-103.el8cp (4a5dd59c2e6616f05cc94e6aab2bddf1339ca4f4) pacific (stable)": 3 }, "mgr": { "ceph version 16.2.10-103.el8cp (4a5dd59c2e6616f05cc94e6aab2bddf1339ca4f4) pacific (stable)": 2 }, "osd": { "ceph version 16.2.10-103.el8cp (4a5dd59c2e6616f05cc94e6aab2bddf1339ca4f4) pacific (stable)": 12 }, "mds": { "ceph version 16.2.10-103.el8cp (4a5dd59c2e6616f05cc94e6aab2bddf1339ca4f4) pacific (stable)": 3 }, "overall": { "ceph version 16.2.10-103.el8cp (4a5dd59c2e6616f05cc94e6aab2bddf1339ca4f4) pacific (stable)": 20 } } [root@ceph-amk-bootstrap-clx3kj-node7 ~]#
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat Ceph Storage 5.3 Bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:0980