Bug 2203429
| Summary: | Configuring resource Isolation on Hyperconverged Nodes does not work | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | daniel.jameson.1 |
| Component: | ceph-ansible | Assignee: | Guillaume Abrioux <gabrioux> |
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Yogev Rabl <yrabl> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 16.2 (Train) | CC: | bshephar, eharney, fpantano, gfidente, jjoyce, johfulto, jschluet, slinaber, tvignaud |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-06-12 11:58:19 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
daniel.jameson.1
2023-05-12 16:13:26 UTC
(In reply to daniel.jameson.1 from comment #0) > Description of problem: > Followed Redhat's documentation for creating templates for resource > isolation on a hyper converged node, redeployed Openstack, restarted all > nodes, Ceph-OSDs alone were using resources that were out of band from the > resource pool that was defined. How did you determine they were out of band? E.g. did you look at the value of osd_memory_target on the HCI node or run some other command? > Version-Release number of selected component (if applicable): > Openstack 16.2.4 What version of ceph-ansible were you using? Can you share the output of running `rpm -q ceph-ansible` on the undercloud? > Steps to Reproduce: > 1. Create or modify templates for resource isolation per documentation above: > > [stack@undercloud ~]$ cat templates/hci-resource.yaml > parameter_defaults: > ComputeHCIParameters: > NovaReservedHostMemory: 26624 > > > [stack@undercloud ~]$ cat templates/storage-container-config.yaml > parameter_defaults: > CephAnsibleExtraConfig: > is_hci: true I see you're using "is_hci: true". That's good. More on that below regarding osd_memory_target. > Actual results: Ceph-OSDs were taking up 32GB of RAM, the Ceph-OSDS and Nova > Overhead should have only been given 26GBs of RAM for resource isolation What command did you use to determine that they were using 32GB of RAM? Why should it be 26 GB? Do you think it's because you set `NovaReservedHostMemory: 26624`? If so that's not exactly how it works. NovaReservedHostMemory tells the Nova scheduler to not schedule VMs on an HCI node which would require the last 26624 MB. I.e. If the nova scheduler sees a host with X GB of RAM and wants to determine if it can run a VM there, then it must count the available resources on that host, not has having X GB of RAM but as having X-26 GB of RAM. So just because the Nova scheduler will reserve that memory, doesn't mean that the Nova scheduler then puts an upper bound on the Ceph OSD. We want to tell the Nova scheduler to not use memory the OSD would use (so NovaReservedHostMemory is the right thing to do) but to limit the OSD memory we set osd_memory_target. More on that below. > Expected results: > Ceph-OSDs and Nova Overhead being capped at 26GB RAM. > > Additional info: > > Had to do a workaround of setting the ceph osd memory target directly to > resolve this issue: > > [stack@undercloud ~]$ cat templates/ceph-resource.yaml > parameter_defaults: > CephConfigOverrides: > osd: > osd_memory_target: 2147483648 This is along the right lines. Let's look at how osd_memory_target is set by ceph-ansible when is_hci is true. https://github.com/ceph/ceph-ansible/pull/3113/files As you can probably see in the PR above, the total host memory is multiplied by a different safety_factor (depending on is_hci). That value is then divided by the number of OSDs and the osd_memory_target is then set. If you don't agree with that calculation, you don't have to use it, e.g. you can directly override the osd_memory_target in the ceph.conf as you have done. So I'm curious what value for osd_memory_target ceph-ansible computed in your environment. Would you please provided it? Is it inline with the above calculation? Hi Daniel, It's been 6 days and I haven't heard from you. I'll keep this bug report open for another week. John Since it's been another week and I haven't heard back, I'm going to close this bug. If you want to re-open it please provide the requested info. |