Bug 1709564 - Overcloud deployment fails with container permissions error (ppc64le)
Summary: Overcloud deployment fails with container permissions error (ppc64le)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 15.0 (Stein)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: 15.0 (Stein)
Assignee: Tony Breeds
QA Contact: Jason Joyce
URL:
Whiteboard:
Depends On:
Blocks: 1667964 1723665 1726483
TreeView+ depends on / blocked
 
Reported: 2019-05-13 20:14 UTC by David Benoit
Modified: 2019-09-26 10:50 UTC (History)
22 users (show)

Fixed In Version: openstack-tripleo-heat-templates-10.6.1-0.20190711090428.245f17c.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1723665 (view as bug list)
Environment:
Last Closed: 2019-09-21 11:21:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
selinux audit log (7.99 MB, text/plain)
2019-06-04 14:35 UTC, David Benoit
no flags Details


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 666498 0 'None' MERGED Increase the default memlock to 64MiB via ``DockerNovaComputeUlimit``. 2021-01-26 10:23:20 UTC
OpenStack gerrit 669791 0 'None' MERGED Increase the default memlock to 64MiB via ``DockerNovaComputeUlimit``. 2021-01-26 10:23:20 UTC
Red Hat Product Errata RHEA-2019:2811 0 None None None 2019-09-21 11:22:17 UTC

Description David Benoit 2019-05-13 20:14:29 UTC
Description of problem:
Overcloud deployment fails with container permissions error.

Version-Release number of selected component (if applicable):
RHOS_TRUNK-15.0-RHEL-8-20190509.n.0 puddle
RHOS_TRUNK-15.0-RHEL-8-20190509.n.1 puddle

How reproducible:
Always

Steps to Reproduce:
1. Deploy undercloud using tripleo
2. Deploy overcloud using tripleo

Actual results:
fatal: [overcloud-novacomputeppc64le-0]: FAILED! => {
    "failed_when_result": true,
    "outputs.stdout_lines | default([]) | union(outputs.stderr_lines | default([]))": [
        "Error running ['podman', 'run', '--name', 'nova_cell_v2_discover_hosts', '--label', 'config_id=tripleo_step5', '--label', 'container_name=nova_cell_v2_discover_hosts', '--label', 'managed_by=paunch', '--label', 'config_data={\"command\": \"/usr/bin/bootstrap_host_exec nova_compute su nova -s /bin/bash -c \\'/container-config-scripts/pyshim.sh /container-config-scripts/nova_cell_v2_discover_hosts.py\\'\", \"detach\": false, \"environment\": [\"TRIPLEO_DEPLOY_IDENTIFIER=1557430479\", \"__OS_DEBUG=true\"], \"image\": \"brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-nova-compute:15.0\", \"net\": \"host\", \"start_order\": 0, \"user\": \"root\", \"volumes\": [\"/etc/hosts:/etc/hosts:ro\", \"/etc/localtime:/etc/localtime:ro\", \"/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro\", \"/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro\", \"/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro\", \"/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro\", \"/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro\", \"/dev/log:/dev/log\", \"/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro\", \"/etc/puppet:/etc/puppet:ro\", \"/var/lib/config-data/nova_libvirt/etc/my.cnf.d/:/etc/my.cnf.d/:ro\", \"/var/lib/config-data/nova_libvirt/etc/nova/:/etc/nova/:ro\", \"/var/log/containers/nova:/var/log/nova\", \"/var/lib/container-config-scripts/:/container-config-scripts/\"]}', '--conmon-pidfile=/var/run/nova_cell_v2_discover_hosts.pid', '--log-driver', 'json-file', '--log-opt', 'path=/var/log/containers/stdouts/nova_cell_v2_discover_hosts.log', '--env=TRIPLEO_DEPLOY_IDENTIFIER=1557430479', '--env=__OS_DEBUG=true', '--net=host', '--user=root', '--volume=/etc/hosts:/etc/hosts:ro', '--volume=/etc/localtime:/etc/localtime:ro', '--volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro', '--volume=/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro', '--volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro', '--volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro', '--volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro', '--volume=/dev/log:/dev/log', '--volume=/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro', '--volume=/etc/puppet:/etc/puppet:ro', '--volume=/var/lib/config-data/nova_libvirt/etc/my.cnf.d/:/etc/my.cnf.d/:ro', '--volume=/var/lib/config-data/nova_libvirt/etc/nova/:/etc/nova/:ro', '--volume=/var/log/containers/nova:/var/log/nova', '--volume=/var/lib/container-config-scripts/:/container-config-scripts/', 'brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-nova-compute:15.0', '/usr/bin/bootstrap_host_exec', 'nova_compute', 'su', 'nova', '-s', '/bin/bash', '-c', \"'/container-config-scripts/pyshim.sh\", \"/container-config-scripts/nova_cell_v2_discover_hosts.py'\"]. [1]",
        "",
        "stdout: ",
        "stderr: su: cannot open session: Permission denied"
    ]
}


Expected results:
Overcloud deploys without errors.

Other info:
Logs and template files available on request.

Comment 3 Julie Pichon 2019-06-04 08:26:09 UTC
Could this be run again in permissive mode, and then attach the audit.log file to this bug? Thank you.

Comment 4 David Benoit 2019-06-04 14:35:41 UTC
Created attachment 1577155 [details]
selinux audit log

audit log from tripleo install in selinux permissive mode.  I believe this is the oldest one, and should contain the audits produced during the tripleo overcloud installation.

Comment 5 David Benoit 2019-06-04 14:40:36 UTC
To clarify a bit further, all of the deployments do far have been run in permissive mode, and we are still seeing these errors.  It is not entirely clear to us whether this is selinux or some other permissions issue.

Thanks!

Comment 6 Julie Pichon 2019-06-04 15:39:22 UTC
Thank you for the file and additional context. Do you mean the deployment fails with the "Permission Denied" error even with SELinux set to permissive?

There are some AVC denials related to dbus sockets and containers that may be of interest, e.g.

type=AVC msg=audit(1559585457.605:2442): avc:  denied  { connectto } for  pid=40544 comm="sudo" path="/run/dbus/system_bus_socket" scontext=system_u:system_r:container_t:s0:c432,c848 tcontext=system_u:system_r:system_dbusd_t:s0-s0:c0.c1023 tclass=unix_stream_socket permissive=1
type=AVC msg=audit(1559585460.222:2453): avc: denied { connectto } for pid=40825 comm="sudo" path="/run/dbus/system_bus_socket" scontext=system_u:system_r:container_t:s0:c16,c110 tcontext=system_u:system_r:system_dbusd_t:s0-s0:c0.c1023 tclass=unix_stream_socket permissive=1

and a bunch of other denials related to rhscmcertd and dnf/rpms that may not be, if we are using internal puddles?

type=AVC msg=audit(1559596349.030:14473): avc:  denied  { open } for  pid=321713 comm="rhsmcertd-worke" path="/var/cache/dnf/metadata_lock.pid" dev="dm-0" ino=107977532 scontext=system_u:system_r:rhsmcertd_t:s0 tcontext=system_u:object_r:rpm_var_cache_t:s0 tclass=file permissive=1
type=AVC msg=audit(1559596349.203:14474): avc:  denied  { open } for  pid=321713 comm="rhsmcertd-worke" path="/var/cache/dnf/rhel-8-for-x86_64-appstream-rpms-9d3886b51bb367d7/repodata/repomd.xml" dev="dm-0" ino=40090136 scontext=system_u:system_r:rhsmcertd_t:s0 tcontext=unconfined_u:object_r:rpm_var_cache_t:s0 tclass=file permissive=1

Are the overcloud deployment log files also available, to pinpoint the full failure error message?

Comment 7 Julie Pichon 2019-06-10 17:01:17 UTC
Although there are SELinux-related messages we'll likely want to clean up in the future, this is not the source of the current problem because the failure still occurs when running in permissive mode.

(undercloud) [stack@director-ci07 ~]$ getenforce
Permissive
[heat-admin@overcloud-novacomputeppc64le-0 ~]$ getenforce
Permissive

On the compute node, I found this message in /var/log/secure-20190609 that seems to match the failure timestamp for the error in comment 0:

Jun  3 19:44:18 overcloud-novacomputeppc64le-0 su[38956]: pam_limits(su:session): Could not set limit for 'memlock': Operation not permitted

I'm not sure if this is related to the container nova_cell_v2_discover_hosts in particular or if it's something more generic so, moving to DFG Compute for further investigation at this point.

The reporter has a live environment available for debugging and indicated that the problem started occurring sometime between puddles 2019-05-07 (last known success with tag 2019-05-07 15:38:36) and 2019-05-09 (when it was definitely failing).

Comment 8 Martin Schuppert 2019-06-14 06:48:34 UTC
This could be an ulimit issue. Can this be reproduced outside the deploy, just by running:

$ podman exec -it -u root nova_compute su nova -s /bin/bash

Comment 9 David Benoit 2019-06-17 17:11:16 UTC
>This could be an ulimit issue. Can this be reproduced outside the deploy, just by running:
>
>$ podman exec -it -u root nova_compute su nova -s /bin/bash

The container seems to run alright when podman is executed from root, but the container is not visible to the heat-admin user.

```
[heat-admin@overcloud-novacomputeppc64le-0 ~]$ podman exec -it -u root nova_compute su nova -s /bin/bash
unable to exec into nova_compute: no container with name or ID nova_compute found: no such container
[heat-admin@overcloud-novacomputeppc64le-0 ~]$ sudo podman exec -it -u root nova_compute su nova -s /bin/bash
()[nova@overcloud-novacomputeppc64le-0 /]$
```

Comment 10 Tony Breeds 2019-06-18 23:37:05 UTC
(In reply to Martin Schuppert from comment #8)
> This could be an ulimit issue. Can this be reproduced outside the deploy,
> just by running:
> 
> $ podman exec -it -u root nova_compute su nova -s /bin/bash

That works for me:
 [stack@director ~]$ sudo podman exec -it -u root nova_compute su nova -s /bin/bash
 ()[nova@director /]$ exit

The "good" news is I'm now hitting this too ;P

Comment 11 Tony Breeds 2019-06-19 02:27:18 UTC
(In reply to Tony Breeds from comment #10)
> (In reply to Martin Schuppert from comment #8)
> > This could be an ulimit issue. Can this be reproduced outside the deploy,
> > just by running:
> > 
> > $ podman exec -it -u root nova_compute su nova -s /bin/bash
> 
> That works for me:
>  [stack@director ~]$ sudo podman exec -it -u root nova_compute su nova -s
> /bin/bash
>  ()[nova@director /]$ exit
> 
> The "good" news is I'm now hitting this too ;P

And of course it "worked" because I ran it on the director not the ppc64le compute node :(

Comment 12 Tony Breeds 2019-06-19 02:57:42 UTC
Looks like it is ulimits

[root@overcloud-novacomputeppc64le-0 ~]# bash /tmp/1709564_repro.sh                                                                                                                                          [5/1896]core file size          (blocks, -c) unlimited                                                                                                                                                                       data seg size           (kbytes, -d) unlimited                                                                                                                                                                       scheduling priority             (-e) 0                                                                                                                                                                               
file size               (blocks, -f) unlimited
pending signals                 (-i) 425883
max locked memory       (kbytes, -l) 16384
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 425883
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
+ podman rm nova_cell_v2_discover_hosts1
57f74e9f6c451821402768cf6a6602b79f977a0cb9644b26b550fc2bf620f74c
+ podman run --name nova_cell_v2_discover_hosts1 --label config_id=tripleo_step5 --label container_name=nova_cell_v2_discover_hosts --label managed_by=paunch --label ''\''config_data={"command": "/usr/bin/bootstra
p_host_exec nova_compute su nova -s /bin/bash -c \'\''/container-config-scripts/pyshim.sh /container-config-scripts/nova_cell_v2_discover_hosts.py\'\''", "detach": false, "environment": ["TRIPLEO_DEPLOY_IDENTIFIER
=1560841493", "__OS_DEBUG=true"], "image": "brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-nova-compute:15.0-44", "net": "host", "start_order": 0, "user": "root", "volumes": ["/etc/hosts:/e
tc/hosts:ro", "/etc/localtime:/etc/localtime:ro", "/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro", "/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro", "/etc/pki/tls/certs/ca-bundle
.crt:/etc/pki/tls/certs/ca-bundle.crt:ro", "/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro", "/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro", "/dev/log:/dev/log", "/etc/ssh/ssh_k
nown_hosts:/etc/ssh/ssh_known_hosts:ro", "/etc/puppet:/etc/puppet:ro", "/var/lib/config-data/nova_libvirt/etc/my.cnf.d/:/etc/my.cnf.d/:ro", "/var/lib/config-data/nova_libvirt/etc/nova/:/etc/nova/:ro", "/var/log/co
ntainers/nova:/var/log/nova", "/var/lib/container-config-scripts/:/container-config-scripts/"]}'\''' --conmon-pidfile=/var/run/nova_cell_v2_discover_hosts.pid --log-driver json-file --log-opt path=/var/log/contain
ers/stdouts/nova_cell_v2_discover_hosts.log --env=TRIPLEO_DEPLOY_IDENTIFIER=1560841493 --env=__OS_DEBUG=true --net=host --user=root --volume=/etc/hosts:/etc/hosts:ro --volume=/etc/localtime:/etc/localtime:ro --vol
ume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro --volume=/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro --volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.c
rt:ro --volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro --volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro --volume=/dev/log:/dev/log --volume=/etc/ssh/ssh_known_hosts:
/etc/ssh/ssh_known_hosts:ro --volume=/etc/puppet:/etc/puppet:ro --volume=/var/lib/config-data/nova_libvirt/etc/my.cnf.d/:/etc/my.cnf.d/:ro --volume=/var/lib/config-data/nova_libvirt/etc/nova/:/etc/nova/:ro --volum
e=/var/log/containers/nova:/var/log/nova --volume=/var/lib/container-config-scripts/:/container-config-scripts/ brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-nova-compute:15.0-44 /usr/bin/
bootstrap_host_exec nova_compute su nova -s /bin/bash -c '/container-config-scripts/pyshim.sh /container-config-scripts/nova_cell_v2_discover_hosts.py'
su: cannot open session: Permission denied
+ ulimit -l unlimited
+ ulimit -a
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 425883
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 425883
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
+ podman rm nova_cell_v2_discover_hosts1
99aeef0b277088a9d2cc2aa3ebb4cda9f86c42bc18981cc637ec3c2736c59a41
+ podman run --name nova_cell_v2_discover_hosts1 --label config_id=tripleo_step5 --label container_name=nova_cell_v2_discover_hosts --label managed_by=paunch --label ''\''config_data={"command": "/usr/bin/bootstra
p_host_exec nova_compute su nova -s /bin/bash -c \'\''/container-config-scripts/pyshim.sh /container-config-scripts/nova_cell_v2_discover_hosts.py\'\''", "detach": false, "environment": ["TRIPLEO_DEPLOY_IDENTIFIER
=1560841493", "__OS_DEBUG=true"], "image": "brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-nova-compute:15.0-44", "net": "host", "start_order": 0, "user": "root", "volumes": ["/etc/hosts:/e
tc/hosts:ro", "/etc/localtime:/etc/localtime:ro", "/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro", "/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro", "/etc/pki/tls/certs/ca-bundle
.crt:/etc/pki/tls/certs/ca-bundle.crt:ro", "/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro", "/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro", "/dev/log:/dev/log", "/etc/ssh/ssh_k
nown_hosts:/etc/ssh/ssh_known_hosts:ro", "/etc/puppet:/etc/puppet:ro", "/var/lib/config-data/nova_libvirt/etc/my.cnf.d/:/etc/my.cnf.d/:ro", "/var/lib/config-data/nova_libvirt/etc/nova/:/etc/nova/:ro", "/var/log/co
ntainers/nova:/var/log/nova", "/var/lib/container-config-scripts/:/container-config-scripts/"]}'\''' --conmon-pidfile=/var/run/nova_cell_v2_discover_hosts.pid --log-driver json-file --log-opt path=/var/log/contain
ers/stdouts/nova_cell_v2_discover_hosts.log --env=TRIPLEO_DEPLOY_IDENTIFIER=1560841493 --env=__OS_DEBUG=true --net=host --user=root --volume=/etc/hosts:/etc/hosts:ro --volume=/etc/localtime:/etc/localtime:ro --vol
ume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro --volume=/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro --volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.c
rt:ro --volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro --volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro --volume=/dev/log:/dev/log --volume=/etc/ssh/ssh_known_hosts:
/etc/ssh/ssh_known_hosts:ro --volume=/etc/puppet:/etc/puppet:ro --volume=/var/lib/config-data/nova_libvirt/etc/my.cnf.d/:/etc/my.cnf.d/:ro --volume=/var/lib/config-data/nova_libvirt/etc/nova/:/etc/nova/:ro --volum
e=/var/log/containers/nova:/var/log/nova --volume=/var/lib/container-config-scripts/:/container-config-scripts/ brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-nova-compute:15.0-44 /usr/bin/
bootstrap_host_exec nova_compute su nova -s /bin/bash -c '/container-config-scripts/pyshim.sh /container-config-scripts/nova_cell_v2_discover_hosts.py'
Usage: pyshim.sh <script and/or arguments>

Comment 13 Martin Schuppert 2019-06-25 07:06:46 UTC
thx for confirming that its the ulimit issue and the submitted patches!

Comment 20 errata-xmlrpc 2019-09-21 11:21:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:2811


Note You need to log in before you can comment on or make changes to this bug.