Jun 10 14:57:04 overcloud-novacomputeppc64le-0 podman[43458]: Error: unable to start container "iscsid": container_linux.go:349: starting container process caused "process_linux.go:297: applying cgroup configuration for process caused \"failed to write \\\"0,8,16,24,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191\\\" to \\\"/sys/fs/cgroup/cpuset/machine.slice/libpod-e01a419f646f17e1358c6d2c172579480fd2c4b6f39de05db0382a0a4089aa3e.scope/cpuset.cpus\\\": write /sys/fs/cgroup/cpuset/machine.slice/libpod-e01a419f646f17e1358c6d2c172579480fd2c4b6f39de05db0382a0a4089aa3e.scope/cpuset.cpus: invalid argument\"": OCI runtime error When using a power8 compute node, this error occurs and leads to a failed deployment. In addressing BZ1835269 (execute kvm-setup in startup of nova_libvirt container), there was an unintended side effect... - tripleo configures cpuset_cpus for the iscsid container based on current state of cpus - nova_libvirt container starts first and disables smt -- now the state of cpus is different - iscisd container starts, but its configuration is based on outdated state, so it fails it is conceivable that the problem may affect other containers
We have a number of ways to correct this defect on Power8 1. A patch to Paunch to *never* emit the --cpu_sets arg to podman. This seems like a bad idea and will create a user visible change 2. Create 2 ansible tasks (run at step 0) 1) to disable SMT on power8 CPUs at system boot and ; 2) Manually run that before any container config 3. Update all containers with something like: * https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/train/deployment/nova/nova-libvirt-container-puppet.yaml#L228 * https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/train/deployment/nova/nova-libvirt-container-puppet.yaml#L699 To effectively implement option 1
https://review.opendev.org/#/c/737549/ might be a potential fix, can you test it?
the paunch changes in 737549 are helpful -- deploy on power8 succeeds
*** Bug 1834901 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3148