For NFV workloads, in order to achieve 0 packet loss, linux processes, ovs-dpdk (if applicable) and VMs are isolated thanks to kernel args (isolcpus) and tuned profiles (cpu-partitioning). Yet, all docker containers are run without cpu-isolation, as the parameter "cpuset-cpus" is undefined. For example: 49582 ? Sl 0:00 \_ /usr/bin/docker-containerd-shim-current c35beb0c9708a114cac3110ba8bf0235eb10864d0f13b653feebdf5b9e509677 /var/run/docker/libcontainerd/c35beb0c9708a114cac3110ba8bf0235eb10 49600 ? Ss 0:00 | \_ /bin/bash /neutron_ovs_agent_launcher.sh 49714 ? S 24:59 | \_ /usr/bin/python2 /usr/bin/neutron-openvswitch-agent --config-file /usr/share/neutron/neutron-dist.conf --config-file /etc/neutron/neutron.conf --config-file /etc/n # cat /proc/49582/status | grep Cpus_allowed_list Cpus_allowed_list: 0,12 --> docker-containerd-shim-current is correctly isolated # cat /proc/49600/status | grep Cpus_allowed_list Cpus_allowed_list: 0-23 --> The process is allowed to run on ALL cpus instead of only 0,12. This can be reproduced as well on a machine with the same tuning (isolcpus & cpu-partitioning) by starting a simple container: # docker run --rm -ti centos:7 cat /proc/1/status | grep Cpus_allowed_list Cpus_allowed_list: 0-23 In order to have these containers well isolated from both ovs-dpdk AND the virtual machines, we have to set the value of "CpusetCpus" when the container is started. # docker run --rm --cpuset-cpus=$(cat /proc/self/status | awk '/Cpus_allowed_list/ {print$2}') -ti centos:7 cat /proc/1/status | grep Cpus_allowed_list Cpus_allowed_list: 0,12 For all containers, the value of "CpusetCpus" must be set to the non-isolated cpus in order to avoid any interruption in the packet processing (for ovs-dpdk and the VNFs). For existing deployments, the following command should be run to repin the containers: # docker ps -q | xargs docker update --cpuset-cpus=$(cat /proc/self/status | awk '/Cpus_allowed_list/ {print$2}')
If we apply the workaround above, we cannot start VMs (openstack server show on failed VM): | fault | {u'message': u"Unable to write to '/sys/fs/cgroup/cpuset/machine.slice/machine-qemu\\x2d1\\x2dinstance\\x2d00000001.scope/vcpu0/cpuset.cpus': Permission denied", u'code': 500, u'details': u' File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1858, in _do_build_and_run_instance\n filter_properties, request_spec)\n File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2142, in _build_and_run_instance\n instance_uuid=instance.uuid, reason=six.text_type(e))\n', u'created': u'2019-09-10T18:44:28Z'} | | flavor | vnfc (9b09d8fa-b32f-466c-909d-4aff9afe2d00) Libvirt will need to be excluded from the workaround above.
Indeed, pinning nova containers may need to issues, here is the proper command line which re-pins all containers but nova*: docker ps -q | grep -v -E $(docker ps -q --filter='name=nova' | paste -sd "|" - ) | xargs docker update --cpuset-cpus=$(cat /proc/self/status | awk '/Cpus_allowed_list/ {print $2}') We can check before/after the remaining processes dangling on all CPUs: for s in /proc/[0-9]* ; do if [[ $(grep $(lscpu | awk '/On-line/ {print $4}') $s/status) ]]; then cat $s/cmdline ; echo ''; fi ;done | sort -u
On compute-0 server: [heat-admin@compute-0 ~]$ cat /proc/self/status | grep Cpus_allowed_list Cpus_allowed_list: 0-1 [heat-admin@compute-0 ~]$ sudo docker inspect nova_compute |grep CpusetCpus "CpusetCpus": "0,1", Containers on compute-0 are pinned. On controller-0 (To verify it was not broken): [heat-admin@controller-0 ~]$ cat /proc/self/status | grep Cpus_allowed_list Cpus_allowed_list: 0-7 [heat-admin@controller-0 ~]$ sudo docker inspect nova_api_db_sync |grep CpusetCpus "CpusetCpus": "0,1,2,3,4,5,6,7", Containers on controller-0 are not pinned.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0760
For the record, the initial implementation created a regression for PPC, see https://bugzilla.redhat.com/show_bug.cgi?id=1813091