Bug 1695850

Summary:	ceph-ansible containerized Ceph MDS is limited to 1 CPU core by default - not enough
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Ben England <bengland>
Component:	Ceph-Ansible	Assignee:	Dimitri Savineau <dsavinea>
Status:	CLOSED ERRATA	QA Contact:	Vasishta <vashastr>
Severity:	high	Docs Contact:	Aron Gunn <agunn>
Priority:	unspecified
Version:	3.2	CC:	agunn, aschoen, ceph-eng-bugs, ceph-qe-bugs, dsavinea, edonnell, gmeno, nthomas, sankarshan, sweil, tchandra, tserlin, ymane
Target Milestone:	rc
Target Release:	3.3
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	RHEL: ceph-ansible-3.2.16-1.el7cp Ubuntu: ceph-ansible_3.2.16-2redhat1	Doc Type:	Bug Fix
Doc Text:	.An increase to the CPU allocation for containerized Ceph MDS deployments Previously, for container-based deployments, the CPU allocation for the Ceph MDS daemons was set to `1` as the default. In some scenarios, this caused slow performance when compared to a bare-metal deployment. With this release, the Ceph MDS daemon CPU allocation default is `4`.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-08-21 15:10:49 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1726135

Description Ben England 2019-04-03 19:30:41 UTC

Description of problem:

Cephfs for metadata-intensive workloads, such as large directories, small files, will perform far slower for containerized Ceph than for non-containerized Ceph. Ideally there should be no significant difference for almost all workloads between containerized and non-containerized Ceph.

Version-Release number of selected component (if applicable):

RHCS 3.2 (and RHCS 4)

How reproducible:

should be every time, haven't actually compared directly but have measured ceph-mds CPU consumption many times for such workloads and this is typically well above 1 core, usually more like 3-4 cores.

Steps to Reproduce:
1. Install containerized Ceph
2. run metadata-intensive workload where Ceph MDS is bottleneck
3. observe CPU consumption of ceph-mds process
4. compare to CPU consumption of ceph-mds in non-containerized Ceph

Actual results:

It is expected that containerized-Ceph Cephfs user will get 1/2-1/3 metadata performance of bare-metal Ceph.

Expected results:

User should get something resembling bare-metal Ceph performance with containerized Ceph, without tuning, in most cases.

Additional info:

See https://github.com/ceph/ceph-ansible/blob/master/roles/ceph-mds/defaults/main.yml#L23

This shows that default for CPU CGroup limit is 1 core.

I think this should default to 4 CPU cores since I have seen Ceph-MDS CPU utilization get that high for metadata-intensive workloads (such as smallfile), and there is typically only one such process per host. But something > 2 would be satisfactory.

Hyperconverged environments could elect to lower it to free up cores for other processes, but I doubt they will, since this is an *upper bound* on CPU consumption, and if the MDS is not being used then the CPU resources are available for others to use.

workloads to reproduce this problem are available using:

https://github.com/distributed-system-analysis/smallfile

Comment 1 Ben England 2019-04-04 18:48:47 UTC

I should assign to Ceph-ansible, sorry Patrick.

Comment 9 errata-xmlrpc 2019-08-21 15:10:49 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:2538