Bug 1847188 - tripleo_iscsid doesn't start on power8 compute node
Summary: tripleo_iscsid doesn't start on power8 compute node
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-paunch
Version: 16.1 (Train)
Hardware: ppc64le
OS: Unspecified
high
high
Target Milestone: rc
: 16.1 (Train on RHEL 8.2)
Assignee: Jeremy Freudberg
QA Contact: David Rosenfeld
URL:
Whiteboard:
Depends On:
Blocks: 1835269
TreeView+ depends on / blocked
 
Reported: 2020-06-15 20:55 UTC by Jeremy Freudberg
Modified: 2020-07-29 07:53 UTC (History)
9 users (show)

Fixed In Version: python-paunch-5.3.3-0.20200527083421.16ae5e4.el8ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-29 07:53:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 737549 0 None MERGED podman: get cpus allowed list only when isolcpus in cmdline 2020-09-02 12:29:09 UTC
Red Hat Product Errata RHBA-2020:3148 0 None None None 2020-07-29 07:53:32 UTC

Description Jeremy Freudberg 2020-06-15 20:55:30 UTC
Jun 10 14:57:04 overcloud-novacomputeppc64le-0 podman[43458]: Error: unable to start container "iscsid": container_linux.go:349: starting container process caused "process_linux.go:297: applying cgroup configuration for process caused \"failed to write \\\"0,8,16,24,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191\\\" to \\\"/sys/fs/cgroup/cpuset/machine.slice/libpod-e01a419f646f17e1358c6d2c172579480fd2c4b6f39de05db0382a0a4089aa3e.scope/cpuset.cpus\\\": write /sys/fs/cgroup/cpuset/machine.slice/libpod-e01a419f646f17e1358c6d2c172579480fd2c4b6f39de05db0382a0a4089aa3e.scope/cpuset.cpus: invalid argument\"": OCI runtime error


When using a power8 compute node, this error occurs and leads to a failed deployment.


In addressing BZ1835269 (execute kvm-setup in startup of nova_libvirt container), there was an unintended side effect...
- tripleo configures cpuset_cpus for the iscsid container based on current state of cpus
- nova_libvirt container starts first and disables smt -- now the state of cpus is different
- iscisd container starts, but its configuration is based on outdated state, so it fails


it is conceivable that the problem may affect other containers

Comment 1 Tony Breeds 2020-06-22 22:58:46 UTC
We have a number of ways to correct this defect on Power8
1. A patch to Paunch to *never* emit the --cpu_sets arg to podman.  This seems like a bad idea and will create a user visible change
2. Create 2 ansible tasks (run at step 0)
    1) to disable SMT on power8 CPUs at system boot and ;
    2) Manually run that before any container config
3. Update all containers with something like:
    * https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/train/deployment/nova/nova-libvirt-container-puppet.yaml#L228
    * https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/train/deployment/nova/nova-libvirt-container-puppet.yaml#L699
   To effectively implement option 1

Comment 2 Emilien Macchi 2020-06-23 15:50:04 UTC
https://review.opendev.org/#/c/737549/ might be a potential fix, can you test it?

Comment 3 Jeremy Freudberg 2020-06-23 21:00:20 UTC
the paunch changes in 737549 are helpful -- deploy on power8 succeeds

Comment 4 Alex Schultz 2020-06-24 13:39:30 UTC
*** Bug 1834901 has been marked as a duplicate of this bug. ***

Comment 18 errata-xmlrpc 2020-07-29 07:53:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3148


Note You need to log in before you can comment on or make changes to this bug.