Bug 1695850 - ceph-ansible containerized Ceph MDS is limited to 1 CPU core by default - not enough
Summary: ceph-ansible containerized Ceph MDS is limited to 1 CPU core by default - no...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Ansible
Version: 3.2
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: rc
: 3.3
Assignee: Dimitri Savineau
QA Contact: Vasishta
Aron Gunn
URL:
Whiteboard:
Depends On:
Blocks: 1726135
TreeView+ depends on / blocked
 
Reported: 2019-04-03 19:30 UTC by Ben England
Modified: 2019-08-21 15:11 UTC (History)
13 users (show)

Fixed In Version: RHEL: ceph-ansible-3.2.16-1.el7cp Ubuntu: ceph-ansible_3.2.16-2redhat1
Doc Type: Bug Fix
Doc Text:
.An increase to the CPU allocation for containerized Ceph MDS deployments Previously, for container-based deployments, the CPU allocation for the Ceph MDS daemons was set to `1` as the default. In some scenarios, this caused slow performance when compared to a bare-metal deployment. With this release, the Ceph MDS daemon CPU allocation default is `4`.
Clone Of:
Environment:
Last Closed: 2019-08-21 15:10:49 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph-ansible pull 3903 0 None None None 2019-04-23 19:58:00 UTC
Red Hat Product Errata RHSA-2019:2538 0 None None None 2019-08-21 15:11:02 UTC

Description Ben England 2019-04-03 19:30:41 UTC
Description of problem:

Cephfs for metadata-intensive workloads, such as large directories, small files, will perform far slower for containerized Ceph than for non-containerized Ceph.   Ideally there should be no significant difference for almost all workloads between containerized and non-containerized Ceph.

Version-Release number of selected component (if applicable):

RHCS 3.2 (and RHCS 4)

How reproducible:

should be every time, haven't actually compared directly but have measured ceph-mds CPU consumption many times for such workloads and this is typically well above 1 core, usually more like 3-4 cores.

Steps to Reproduce:
1. Install containerized Ceph
2. run metadata-intensive workload where Ceph MDS is bottleneck
3. observe CPU consumption of ceph-mds process
4. compare to CPU consumption of ceph-mds in non-containerized Ceph

Actual results:

It is expected that containerized-Ceph Cephfs user will get 1/2-1/3 metadata performance of bare-metal Ceph.

Expected results:

User should get something resembling bare-metal Ceph performance with containerized Ceph, without tuning, in most cases.

Additional info:

See https://github.com/ceph/ceph-ansible/blob/master/roles/ceph-mds/defaults/main.yml#L23

This shows that default for CPU CGroup limit is 1 core.

I think this should default to 4 CPU cores since I have seen Ceph-MDS CPU utilization get that high for metadata-intensive workloads (such as smallfile), and there is typically only one such process per host.  But something > 2 would be satisfactory.

Hyperconverged environments could elect to lower it to free up cores for other processes, but I doubt they will, since this is an *upper bound* on CPU consumption, and if the MDS is not being used then the CPU resources are available for others to use.

workloads to reproduce this problem are available using:

https://github.com/distributed-system-analysis/smallfile

Comment 1 Ben England 2019-04-04 18:48:47 UTC
I should assign to Ceph-ansible, sorry Patrick.

Comment 9 errata-xmlrpc 2019-08-21 15:10:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:2538


Note You need to log in before you can comment on or make changes to this bug.