Bug 1687446

Summary: Include numactl in overcloud image (so we may start ceph daemons with numactl)
Product: Red Hat OpenStack Reporter: John Fulton <johfulto>
Component: openstack-tripleo-puppet-elementsAssignee: John Fulton <johfulto>
Status: CLOSED CURRENTRELEASE QA Contact: Yogev Rabl <yrabl>
Severity: medium Docs Contact:
Priority: medium    
Version: 13.0 (Queens)CC: gcharot, hbrock, jschluet, jslagle, marjones, mburns, pgrist, yrabl
Target Milestone: z7Keywords: TestOnly, Triaged, ZStream
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-puppet-elements-8.0.2-2.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-07-10 10:41:36 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description John Fulton 2019-03-11 13:34:28 UTC
This bug was initially created as a copy of Bug #1684146

I am copying this bug because: 

The ceph-ansible fix in 1684146 isn't sufficient to address the full bug because we need the numactl pacakge to be installed on the overcloud image. This bug will track getting it into the image. 


Original report:

ceph-ansible has the ability to start OSDs with `docker run --cpuset-mems` or --cpuset-cpus [1]. However, there are benefits to starting OSD containers with numactl because we could use the --preferred option [2]. One way to implement this request is to modify the unit files and ceph-osd-run script [3] to support a prefix which the user could set.

[1] 
https://github.com/ceph/ceph-ansible/commit/8cba44262cf7291091b2318b563a28380e5049fd

[2] 
"""
When you numa-pin Ceph daemons, beware of difference between numactl --preferred and numactl --membind .  --preferred means that the program can allocate memory outside the NUMA socket if it has to, but --membind means you either swap or run the OOM killer when that NUMA node runs out of memory.   And that condition depends on what else has already been allocated.  Even with --preferred, you should do much better than with random allocation of memory from NUMA nodes.  But I don't see any way to express --preferred in ceph-ansible or "docker run" parameters, do you?  That to me makes ceph_osd_docker_cpuset_mems risky.  --preferred is a much softer landing when you run out of memory, and so it should be the default
"""

[3]
https://github.com/ceph/ceph-ansible/blob/master/roles/ceph-osd/templates/ceph-osd-run.sh.j2#L73

Comment 2 Yogev Rabl 2019-06-25 14:26:52 UTC
Verified

Comment 3 Lon Hohberger 2019-07-10 10:41:36 UTC
According to our records, this should be resolved by openstack-tripleo-puppet-elements-8.0.2-2.el7ost.  This build is available now.