Bug 1674017
| Summary: | user MUST NOT specify ceph_osd_docker_memory_limit, specify osd_memory_target | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Ben England <bengland> |
| Component: | Documentation | Assignee: | Bara Ancincova <bancinco> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | ceph-qe-bugs <ceph-qe-bugs> |
| Severity: | high | Docs Contact: | John Brier <jbrier> |
| Priority: | high | ||
| Version: | 3.2 | CC: | asriram, jbrier, jdurgin, jharriga, johfulto, kdreyer, mhackett, mnelson, pnguyen, tchandra, twilkins, vumrao |
| Target Milestone: | z2 | ||
| Target Release: | 3.2 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-05-13 16:19:45 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1685931 | ||
|
Description
Ben England
2019-02-08 18:23:07 UTC
slight correction: the penalty for exceeding ceph_osd_docker_memory limit is not a memory allocation failure, the penalty is that the OOM (out-of-memory) killer will terminate the process. A program might conceivably recover from a memory allocation failure in some way, it can't recover from OOM process termination. Details are in: https://github.com/torvalds/linux/blob/master/Documentation/cgroup-v1/memory.txt#L241 Bara, the section should be updated, not removed. The section on ceph_osd_docker_cpu_limit is still valid (we need sections for ceph_rgw_docker_cpu_limit and ceph_mds_docker_cpu_limit too, different bzs). It's just the part about memory that needs updating. As for removing the memory limit entirely, there may be one exception to that -- hyperconverged infrastructure (HCI), where applications and other Ceph services have to be co-resident on the same physical hosts. In that case, I'd suggest setting the ceph_osd_docker_memory_limit to 50% higher than osd_memory_target ceph.conf parameter, so that if some daemon grows really excessively, it can be stopped before it triggers an OOM (out-of-memory) kill that could affect other services or applications. This is a more sane use of the memory limit than before, and was suggested in rook.io discussions concerning memory CGroup limits. Bara, you are correct, osd_memory_target is specifically in the ceph.conf section of all.yml, so to set it you need to have something in all.yml like this -- in this example, we are asking for each OSD to be limited to 6 GB (not GiB) of memory. Normally the user should not have to set ceph_osd_docker_memory_limit, but for this example, suppose you wanted to override the Docker memory CGroup limit to be more constraining than it is by default for an HCI configuration. Using the 50% rule above, that would be 9 GB.
ceph_conf_overrides:
osd:
osd_memory_target=6000000000
ceph_osd_docker_memory_limit: 9g
As for FileStore, this limit does not apply there, but FileStore is gradually being phased out and there should be a migration procedure released with RHCS 4 to migrate existing FileStore customer sites to BlueStore, so I'm not too worried about it.
|