Bug 2130450 - [CephFS] Clone operations are failing with Assertion Error
Summary: [CephFS] Clone operations are failing with Assertion Error
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: CephFS
Version: 5.2
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 6.0
Assignee: Kotresh HR
QA Contact: Amarnath
Masauso Lungu
URL:
Whiteboard:
Depends On:
Blocks: 2126050
TreeView+ depends on / blocked
 
Reported: 2022-09-28 07:36 UTC by Kotresh HR
Modified: 2023-07-04 14:48 UTC (History)
8 users (show)

Fixed In Version: ceph-17.2.3-45.el9cp
Doc Type: Bug Fix
Doc Text:
.The disk full scenario does not corrupt the configuration file anymore Previously, the configuration files were being written directly to the disk without using the temporary files, which involved truncating the existing configuration file and writing the configuration data. This led to the empty configuration files when the disk was full as the truncate was successful, however writing new configuration data failed with `no space` error. Additionally, it led to the failure of all the operations on corresponding subvolumes. With this fix, the configuration data is written to a temporary configuration file and renamed to the original configuration file and prevents truncating the original configuration file.
Clone Of:
Environment:
Last Closed: 2023-03-20 18:58:27 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 55976 0 None None None 2022-09-28 07:36:46 UTC
Red Hat Issue Tracker RHCEPH-5374 0 None None None 2022-09-28 08:23:19 UTC
Red Hat Product Errata RHBA-2023:1360 0 None None None 2023-03-20 18:59:18 UTC

Description Kotresh HR 2022-09-28 07:36:47 UTC
This bug was initially created as a copy of Bug #2094822

I am copying this bug because: 



Description of problem:
Clone operations are failing with Assertion Error.
When we create more clone in my case i have created 130[root@ceph-amk-bz-2-qa3ps0-node7 _nogroup]# ceph fs clone status cephfs clone_status_142
Error EINVAL: Traceback (most recent call last):
  File "/usr/share/ceph/mgr/volumes/fs/operations/versions/__init__.py", line 96, in get_subvolume_object
    self.upgrade_to_v2_subvolume(subvolume)
  File "/usr/share/ceph/mgr/volumes/fs/operations/versions/__init__.py", line 57, in upgrade_to_v2_subvolume
    version = int(subvolume.metadata_mgr.get_global_option('version'))
  File "/usr/share/ceph/mgr/volumes/fs/operations/versions/metadata_manager.py", line 144, in get_global_option
    return self.get_option(MetadataManager.GLOBAL_SECTION, key)
  File "/usr/share/ceph/mgr/volumes/fs/operations/versions/metadata_manager.py", line 138, in get_option
    raise MetadataMgrException(-errno.ENOENT, "section '{0}' does not exist".format(section))
volumes.fs.exception.MetadataMgrException: -2 (section 'GLOBAL' does not exist)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/share/ceph/mgr/mgr_module.py", line 1446, in _handle_command
    return self.handle_command(inbuf, cmd)
  File "/usr/share/ceph/mgr/volumes/module.py", line 437, in handle_command
    return handler(inbuf, cmd)
  File "/usr/share/ceph/mgr/volumes/module.py", line 34, in wrap
    return f(self, inbuf, cmd)
  File "/usr/share/ceph/mgr/volumes/module.py", line 682, in _cmd_fs_clone_status
    vol_name=cmd['vol_name'], clone_name=cmd['clone_name'],  group_name=cmd.get('group_name', None))
  File "/usr/share/ceph/mgr/volumes/fs/volume.py", line 622, in clone_status
    with open_subvol(self.mgr, fs_handle, self.volspec, group, clonename, SubvolumeOpType.CLONE_STATUS) as subvolume:
  File "/lib64/python3.6/contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "/usr/share/ceph/mgr/volumes/fs/operations/subvolume.py", line 72, in open_subvol
    subvolume = loaded_subvolumes.get_subvolume_object(mgr, fs, vol_spec, group, subvolname)
  File "/usr/share/ceph/mgr/volumes/fs/operations/versions/__init__.py", line 101, in get_subvolume_object
    self.upgrade_legacy_subvolume(fs, subvolume)
  File "/usr/share/ceph/mgr/volumes/fs/operations/versions/__init__.py", line 78, in upgrade_legacy_subvolume
    assert subvolume.legacy_mode
AssertionError




Version-Release number of selected component (if applicable):
[root@ceph-amk-bz-1-wu0ar7-node7 ~]# ceph versions
{
    "mon": {
        "ceph version 16.2.8-27.el8cp (b0bd3a6c6f24d3ac855dde96982871257bef866f) pacific (stable)": 3
    },
    "mgr": {
        "ceph version 16.2.8-27.el8cp (b0bd3a6c6f24d3ac855dde96982871257bef866f) pacific (stable)": 2
    },
    "osd": {
        "ceph version 16.2.8-27.el8cp (b0bd3a6c6f24d3ac855dde96982871257bef866f) pacific (stable)": 12
    },
    "mds": {
        "ceph version 16.2.8-27.el8cp (b0bd3a6c6f24d3ac855dde96982871257bef866f) pacific (stable)": 3
    },
    "overall": {
        "ceph version 16.2.8-27.el8cp (b0bd3a6c6f24d3ac855dde96982871257bef866f) pacific (stable)": 20
    }
}
[root@ceph-amk-bz-1-wu0ar7-node7 ~]# 


How reproducible:
1/1


Steps to Reproduce:

Create Subvolumegroup
    ceph fs subvolumegroup create cephfs subvolgroup_clone_status_1
Create Subvolume
    ceph fs subvolume create cephfs subvol_clone_status --size 5368706371 --group_name subvolgroup_clone_status_1
Kernel mount the volume and fill data
Create Snapshot
    ceph fs subvolume snapshot create cephfs subvol_clone_status snap_1 --group_name subvolgroup_clone_status_1
Create 200 Clones out of the above subvolume
    ceph fs subvolume snapshot clone cephfs subvol_clone_status snap_1 clone_status_1 --group_name subvolgroup_clone_status_1

Actual results:


Expected results:
Should fail gracefully

Additional info:

Comment 22 Amarnath 2022-11-02 14:01:37 UTC
Hi Kotresh,

Created Subovlume with fewer data.
Created more than 150 clones out of the subvolume.
Did not observe any errors

Verified in Version: 
[root@ceph-amk-bz-2-8zczch-node7 ~]# ceph versions
{
    "mon": {
        "ceph version 17.2.5-8.el9cp (f2be93d8b38077bd58e70cf252dbbb4cf49e95e4) quincy (stable)": 3
    },
    "mgr": {
        "ceph version 17.2.5-8.el9cp (f2be93d8b38077bd58e70cf252dbbb4cf49e95e4) quincy (stable)": 2
    },
    "osd": {
        "ceph version 17.2.5-8.el9cp (f2be93d8b38077bd58e70cf252dbbb4cf49e95e4) quincy (stable)": 12
    },
    "mds": {
        "ceph version 17.2.5-8.el9cp (f2be93d8b38077bd58e70cf252dbbb4cf49e95e4) quincy (stable)": 3
    },
    "overall": {
        "ceph version 17.2.5-8.el9cp (f2be93d8b38077bd58e70cf252dbbb4cf49e95e4) quincy (stable)": 20
    }
}


A detailed document with all the commands:
https://docs.google.com/document/d/1VuR2PlYrUwDWk6Aw1kKGxX18HcUZZIQn92yZgkmNhbI/edit

Comment 31 errata-xmlrpc 2023-03-20 18:58:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 6.0 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:1360


Note You need to log in before you can comment on or make changes to this bug.