Bug 1820346

Summary: Setting 'ceph.quota.max_bytes' fails
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Goutham Pacha Ravi <gouthamr>
Component: CephFSAssignee: Kotresh HR <khiremat>
Status: CLOSED NOTABUG QA Contact: subhash <vpoliset>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.3CC: ceph-eng-bugs, gcharot, gfarnum, gfidente, hyelloji, sweil, tbarron, zyan
Target Milestone: rc   
Target Release: 4.*   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-04-14 18:12:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Goutham Pacha Ravi 2020-04-02 19:42:33 UTC
Description of problem:

When using OpenStack Manila , the CephFS share driver uses a filesystem attribute: "ceph.quota.max_bytes" quota to constrain the size of CephFS "shares" (directories in CephFS filesytem). 

This problem was discovered with an "external" RHCS 3 (Luminous) ceph cluster and not RHOSP Director deployed ceph. The OpenStack cluster version is RHOSP 16 (so the client is Nautilus)

Manila interacts with CephFS via the ceph_volume_client [1] and when using "setxattr" to set the quota attribute, this is the python traceback:

2020-04-01 21:35:20.328 43 ERROR oslo_messaging.rpc.server [req-cf3f6879-9b14-4420-8ec7-b06a175d2e1a 1864d2f529fa4e36962d40f14d414b4c 2a40cb11dd6b44efa9a0d88c856681fe - - -] Exception during message handling: cephfs.OperationNotSupported: [Errno 95] error in setxattr
2020-04-01 21:35:20.328 43 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2020-04-01 21:35:20.328 43 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming
2020-04-01 21:35:20.328 43 ERROR oslo_messaging.rpc.server     res = self.dispatcher.dispatch(message)
2020-04-01 21:35:20.328 43 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 274, in dispatch
2020-04-01 21:35:20.328 43 ERROR oslo_messaging.rpc.server     return self._do_dispatch(endpoint, method, ctxt, args)
2020-04-01 21:35:20.328 43 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 194, in _do_dispatch
2020-04-01 21:35:20.328 43 ERROR oslo_messaging.rpc.server     result = func(ctxt, **new_args)
2020-04-01 21:35:20.328 43 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/manila/share/manager.py", line 187, in wrapped
2020-04-01 21:35:20.328 43 ERROR oslo_messaging.rpc.server     return f(self, *args, **kwargs)
2020-04-01 21:35:20.328 43 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/manila/utils.py", line 568, in wrapper
2020-04-01 21:35:20.328 43 ERROR oslo_messaging.rpc.server     return func(self, *args, **kwargs)
2020-04-01 21:35:20.328 43 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/manila/share/manager.py", line 1790, in create_share_instance
2020-04-01 21:35:20.328 43 ERROR oslo_messaging.rpc.server     exception=e)
2020-04-01 21:35:20.328 43 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-04-01 21:35:20.328 43 ERROR oslo_messaging.rpc.server     self.force_reraise()
2020-04-01 21:35:20.328 43 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-04-01 21:35:20.328 43 ERROR oslo_messaging.rpc.server     six.reraise(self.type_, self.value, self.tb)
2020-04-01 21:35:20.328 43 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
2020-04-01 21:35:20.328 43 ERROR oslo_messaging.rpc.server     raise value
2020-04-01 21:35:20.328 43 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/manila/share/manager.py", line 1753, in create_share_instance
2020-04-01 21:35:20.328 43 ERROR oslo_messaging.rpc.server     context, share_instance, share_server=share_server)
2020-04-01 21:35:20.328 43 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/manila/share/drivers/cephfs/driver.py", line 272, in create_share
2020-04-01 21:35:20.328 43 ERROR oslo_messaging.rpc.server     mode=self._cephfs_volume_mode)
2020-04-01 21:35:20.328 43 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/ceph_volume_client.py", line 660, in create_volume
2020-04-01 21:35:20.328 43 ERROR oslo_messaging.rpc.server     self.fs.setxattr(path, 'ceph.quota.max_bytes', to_bytes(size), 0)
2020-04-01 21:35:20.328 43 ERROR oslo_messaging.rpc.server   File "cephfs.pyx", line 1087, in cephfs.LibCephFS.setxattr
2020-04-01 21:35:20.328 43 ERROR oslo_messaging.rpc.server cephfs.OperationNotSupported: [Errno 95] error in setxattr
2020-04-01 21:35:20.328 43 ERROR oslo_messaging.rpc.server




Version-Release number of selected component (if applicable):

The Ceph OSD/MDS versions are ceph version 12.2.12-84.el7cp (1ce826ed564c8063ac6c876df66bd8ab31b6cc66) luminous (stable)
The Ceph client version is 14.2.4-125.el8cp (db63624068590e593c47150c7574d08c1ec0d3e4) nautilus (stable)

Kernel versions on the nodes running the ceph 3 cluster: 3.10.0-957.12.2.el7.x86_64
Operating system on the nodes running the ceph 3 cluster: Red Hat Enterprise Linux Server release 7.7 (Maipo)

Keyring of the client user:

[client.manila]
	key = AQDQdd1cAAAAABAA0aXFFTnjH9aO69P0iVvYyg==
	caps mds = "allow *"
	caps mgr = "allow *"
	caps mon = "allow r, allow command 'auth del', allow command 'auth caps', allow command 'auth get', allow command 'auth get-or-create'"
	caps osd = "allow rw"
This bug isn't specific to RHOSP, it's using the ceph_volume_client from RHCS 4 against an RHCS 3 cluster. 


Additional info:

[1] https://github.com/ceph/ceph/blob/master/src/pybind/ceph_volume_client.py

Comment 1 RHEL Program Management 2020-04-02 19:42:39 UTC
Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 2 Goutham Pacha Ravi 2020-04-02 22:14:35 UTC
A while ago, some users using ceph mimic and nautilus releases also reported this: https://bugs.launchpad.net/manila/+bug/1838936

I'd like to know if this is a known issue, or a mismatch of client expectations?

Comment 12 Yan, Zheng 2020-04-09 11:44:29 UTC
there is a quota format change between luminous and nautilus. nautilus libcephfs gets this error if it connects to luminous mds.

Comment 14 Greg Farnum 2020-04-09 13:57:29 UTC
(In reply to Yan, Zheng from comment #12)
> there is a quota format change between luminous and nautilus. nautilus
> libcephfs gets this error if it connects to luminous mds.

Zheng, does this incompatibility mean 1) Nautilus clients won't respect quotas on a Luminous cluster, or merely 2) Nautilus clients can't update quotas on a Luminous cluster?

If (1), this seems like a serious compatibility oversight and I expect we'll need to finagle a Client update.

Comment 15 Yan, Zheng 2020-04-10 07:53:43 UTC
(In reply to Greg Farnum from comment #14)
> (In reply to Yan, Zheng from comment #12)
> > there is a quota format change between luminous and nautilus. nautilus
> > libcephfs gets this error if it connects to luminous mds.
> 
> Zheng, does this incompatibility mean 1) Nautilus clients won't respect
> quotas on a Luminous cluster, or merely 2) Nautilus clients can't update
> quotas on a Luminous cluster?
> 
> If (1), this seems like a serious compatibility oversight and I expect we'll
> need to finagle a Client update.


it's (1) unfortunately

Comment 18 Yan, Zheng 2020-04-14 12:09:00 UTC
because quota is an experimental feature in luminous. The original format does not suite for kernel client. So we changed the format (quota enabled inodes must have snaprealm) and removed the experimental flag in mimic.

Comment 19 Greg Farnum 2020-04-14 18:12:25 UTC
Oh of course, thanks Zheng.

Given that quotas were experimental in Luminous we are allowed to break this compatibility, and since this was found in internal testing I think it's NOTABUG.