Bug 1709564
| Summary: | Overcloud deployment fails with container permissions error (ppc64le) | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | David Benoit <dbenoit> | ||||
| Component: | openstack-tripleo-heat-templates | Assignee: | Tony Breeds <tonyb> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Jason Joyce <jjoyce> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 15.0 (Stein) | CC: | dasmith, eglynn, jfreudbe, jhakimra, jlabarre, jpichon, jpoulin, jschluet, kchamart, lhh, lvrabec, lyarwood, m.andre, mbooth, mburns, mschuppe, sbauza, sclewis, sgordon, tonyb, vromanso, zcaplovi | ||||
| Target Milestone: | rc | Keywords: | Reopened, Triaged | ||||
| Target Release: | 15.0 (Stein) | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | openstack-tripleo-heat-templates-10.6.1-0.20190711090428.245f17c.el8ost | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | |||||||
| : | 1723665 (view as bug list) | Environment: | |||||
| Last Closed: | 2019-09-21 11:21:58 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1667964, 1723665, 1726483 | ||||||
| Attachments: |
|
||||||
Could this be run again in permissive mode, and then attach the audit.log file to this bug? Thank you. Created attachment 1577155 [details]
selinux audit log
audit log from tripleo install in selinux permissive mode. I believe this is the oldest one, and should contain the audits produced during the tripleo overcloud installation.
To clarify a bit further, all of the deployments do far have been run in permissive mode, and we are still seeing these errors. It is not entirely clear to us whether this is selinux or some other permissions issue. Thanks! Thank you for the file and additional context. Do you mean the deployment fails with the "Permission Denied" error even with SELinux set to permissive?
There are some AVC denials related to dbus sockets and containers that may be of interest, e.g.
type=AVC msg=audit(1559585457.605:2442): avc: denied { connectto } for pid=40544 comm="sudo" path="/run/dbus/system_bus_socket" scontext=system_u:system_r:container_t:s0:c432,c848 tcontext=system_u:system_r:system_dbusd_t:s0-s0:c0.c1023 tclass=unix_stream_socket permissive=1
type=AVC msg=audit(1559585460.222:2453): avc: denied { connectto } for pid=40825 comm="sudo" path="/run/dbus/system_bus_socket" scontext=system_u:system_r:container_t:s0:c16,c110 tcontext=system_u:system_r:system_dbusd_t:s0-s0:c0.c1023 tclass=unix_stream_socket permissive=1
and a bunch of other denials related to rhscmcertd and dnf/rpms that may not be, if we are using internal puddles?
type=AVC msg=audit(1559596349.030:14473): avc: denied { open } for pid=321713 comm="rhsmcertd-worke" path="/var/cache/dnf/metadata_lock.pid" dev="dm-0" ino=107977532 scontext=system_u:system_r:rhsmcertd_t:s0 tcontext=system_u:object_r:rpm_var_cache_t:s0 tclass=file permissive=1
type=AVC msg=audit(1559596349.203:14474): avc: denied { open } for pid=321713 comm="rhsmcertd-worke" path="/var/cache/dnf/rhel-8-for-x86_64-appstream-rpms-9d3886b51bb367d7/repodata/repomd.xml" dev="dm-0" ino=40090136 scontext=system_u:system_r:rhsmcertd_t:s0 tcontext=unconfined_u:object_r:rpm_var_cache_t:s0 tclass=file permissive=1
Are the overcloud deployment log files also available, to pinpoint the full failure error message?
Although there are SELinux-related messages we'll likely want to clean up in the future, this is not the source of the current problem because the failure still occurs when running in permissive mode. (undercloud) [stack@director-ci07 ~]$ getenforce Permissive [heat-admin@overcloud-novacomputeppc64le-0 ~]$ getenforce Permissive On the compute node, I found this message in /var/log/secure-20190609 that seems to match the failure timestamp for the error in comment 0: Jun 3 19:44:18 overcloud-novacomputeppc64le-0 su[38956]: pam_limits(su:session): Could not set limit for 'memlock': Operation not permitted I'm not sure if this is related to the container nova_cell_v2_discover_hosts in particular or if it's something more generic so, moving to DFG Compute for further investigation at this point. The reporter has a live environment available for debugging and indicated that the problem started occurring sometime between puddles 2019-05-07 (last known success with tag 2019-05-07 15:38:36) and 2019-05-09 (when it was definitely failing). This could be an ulimit issue. Can this be reproduced outside the deploy, just by running: $ podman exec -it -u root nova_compute su nova -s /bin/bash >This could be an ulimit issue. Can this be reproduced outside the deploy, just by running: > >$ podman exec -it -u root nova_compute su nova -s /bin/bash The container seems to run alright when podman is executed from root, but the container is not visible to the heat-admin user. ``` [heat-admin@overcloud-novacomputeppc64le-0 ~]$ podman exec -it -u root nova_compute su nova -s /bin/bash unable to exec into nova_compute: no container with name or ID nova_compute found: no such container [heat-admin@overcloud-novacomputeppc64le-0 ~]$ sudo podman exec -it -u root nova_compute su nova -s /bin/bash ()[nova@overcloud-novacomputeppc64le-0 /]$ ``` (In reply to Martin Schuppert from comment #8) > This could be an ulimit issue. Can this be reproduced outside the deploy, > just by running: > > $ podman exec -it -u root nova_compute su nova -s /bin/bash That works for me: [stack@director ~]$ sudo podman exec -it -u root nova_compute su nova -s /bin/bash ()[nova@director /]$ exit The "good" news is I'm now hitting this too ;P (In reply to Tony Breeds from comment #10) > (In reply to Martin Schuppert from comment #8) > > This could be an ulimit issue. Can this be reproduced outside the deploy, > > just by running: > > > > $ podman exec -it -u root nova_compute su nova -s /bin/bash > > That works for me: > [stack@director ~]$ sudo podman exec -it -u root nova_compute su nova -s > /bin/bash > ()[nova@director /]$ exit > > The "good" news is I'm now hitting this too ;P And of course it "worked" because I ran it on the director not the ppc64le compute node :( Looks like it is ulimits
[root@overcloud-novacomputeppc64le-0 ~]# bash /tmp/1709564_repro.sh [5/1896]core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 425883
max locked memory (kbytes, -l) 16384
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 425883
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
+ podman rm nova_cell_v2_discover_hosts1
57f74e9f6c451821402768cf6a6602b79f977a0cb9644b26b550fc2bf620f74c
+ podman run --name nova_cell_v2_discover_hosts1 --label config_id=tripleo_step5 --label container_name=nova_cell_v2_discover_hosts --label managed_by=paunch --label ''\''config_data={"command": "/usr/bin/bootstra
p_host_exec nova_compute su nova -s /bin/bash -c \'\''/container-config-scripts/pyshim.sh /container-config-scripts/nova_cell_v2_discover_hosts.py\'\''", "detach": false, "environment": ["TRIPLEO_DEPLOY_IDENTIFIER
=1560841493", "__OS_DEBUG=true"], "image": "brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-nova-compute:15.0-44", "net": "host", "start_order": 0, "user": "root", "volumes": ["/etc/hosts:/e
tc/hosts:ro", "/etc/localtime:/etc/localtime:ro", "/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro", "/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro", "/etc/pki/tls/certs/ca-bundle
.crt:/etc/pki/tls/certs/ca-bundle.crt:ro", "/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro", "/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro", "/dev/log:/dev/log", "/etc/ssh/ssh_k
nown_hosts:/etc/ssh/ssh_known_hosts:ro", "/etc/puppet:/etc/puppet:ro", "/var/lib/config-data/nova_libvirt/etc/my.cnf.d/:/etc/my.cnf.d/:ro", "/var/lib/config-data/nova_libvirt/etc/nova/:/etc/nova/:ro", "/var/log/co
ntainers/nova:/var/log/nova", "/var/lib/container-config-scripts/:/container-config-scripts/"]}'\''' --conmon-pidfile=/var/run/nova_cell_v2_discover_hosts.pid --log-driver json-file --log-opt path=/var/log/contain
ers/stdouts/nova_cell_v2_discover_hosts.log --env=TRIPLEO_DEPLOY_IDENTIFIER=1560841493 --env=__OS_DEBUG=true --net=host --user=root --volume=/etc/hosts:/etc/hosts:ro --volume=/etc/localtime:/etc/localtime:ro --vol
ume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro --volume=/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro --volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.c
rt:ro --volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro --volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro --volume=/dev/log:/dev/log --volume=/etc/ssh/ssh_known_hosts:
/etc/ssh/ssh_known_hosts:ro --volume=/etc/puppet:/etc/puppet:ro --volume=/var/lib/config-data/nova_libvirt/etc/my.cnf.d/:/etc/my.cnf.d/:ro --volume=/var/lib/config-data/nova_libvirt/etc/nova/:/etc/nova/:ro --volum
e=/var/log/containers/nova:/var/log/nova --volume=/var/lib/container-config-scripts/:/container-config-scripts/ brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-nova-compute:15.0-44 /usr/bin/
bootstrap_host_exec nova_compute su nova -s /bin/bash -c '/container-config-scripts/pyshim.sh /container-config-scripts/nova_cell_v2_discover_hosts.py'
su: cannot open session: Permission denied
+ ulimit -l unlimited
+ ulimit -a
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 425883
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 425883
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
+ podman rm nova_cell_v2_discover_hosts1
99aeef0b277088a9d2cc2aa3ebb4cda9f86c42bc18981cc637ec3c2736c59a41
+ podman run --name nova_cell_v2_discover_hosts1 --label config_id=tripleo_step5 --label container_name=nova_cell_v2_discover_hosts --label managed_by=paunch --label ''\''config_data={"command": "/usr/bin/bootstra
p_host_exec nova_compute su nova -s /bin/bash -c \'\''/container-config-scripts/pyshim.sh /container-config-scripts/nova_cell_v2_discover_hosts.py\'\''", "detach": false, "environment": ["TRIPLEO_DEPLOY_IDENTIFIER
=1560841493", "__OS_DEBUG=true"], "image": "brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-nova-compute:15.0-44", "net": "host", "start_order": 0, "user": "root", "volumes": ["/etc/hosts:/e
tc/hosts:ro", "/etc/localtime:/etc/localtime:ro", "/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro", "/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro", "/etc/pki/tls/certs/ca-bundle
.crt:/etc/pki/tls/certs/ca-bundle.crt:ro", "/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro", "/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro", "/dev/log:/dev/log", "/etc/ssh/ssh_k
nown_hosts:/etc/ssh/ssh_known_hosts:ro", "/etc/puppet:/etc/puppet:ro", "/var/lib/config-data/nova_libvirt/etc/my.cnf.d/:/etc/my.cnf.d/:ro", "/var/lib/config-data/nova_libvirt/etc/nova/:/etc/nova/:ro", "/var/log/co
ntainers/nova:/var/log/nova", "/var/lib/container-config-scripts/:/container-config-scripts/"]}'\''' --conmon-pidfile=/var/run/nova_cell_v2_discover_hosts.pid --log-driver json-file --log-opt path=/var/log/contain
ers/stdouts/nova_cell_v2_discover_hosts.log --env=TRIPLEO_DEPLOY_IDENTIFIER=1560841493 --env=__OS_DEBUG=true --net=host --user=root --volume=/etc/hosts:/etc/hosts:ro --volume=/etc/localtime:/etc/localtime:ro --vol
ume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro --volume=/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro --volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.c
rt:ro --volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro --volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro --volume=/dev/log:/dev/log --volume=/etc/ssh/ssh_known_hosts:
/etc/ssh/ssh_known_hosts:ro --volume=/etc/puppet:/etc/puppet:ro --volume=/var/lib/config-data/nova_libvirt/etc/my.cnf.d/:/etc/my.cnf.d/:ro --volume=/var/lib/config-data/nova_libvirt/etc/nova/:/etc/nova/:ro --volum
e=/var/log/containers/nova:/var/log/nova --volume=/var/lib/container-config-scripts/:/container-config-scripts/ brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-nova-compute:15.0-44 /usr/bin/
bootstrap_host_exec nova_compute su nova -s /bin/bash -c '/container-config-scripts/pyshim.sh /container-config-scripts/nova_cell_v2_discover_hosts.py'
Usage: pyshim.sh <script and/or arguments>
thx for confirming that its the ulimit issue and the submitted patches! Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:2811 |
Description of problem: Overcloud deployment fails with container permissions error. Version-Release number of selected component (if applicable): RHOS_TRUNK-15.0-RHEL-8-20190509.n.0 puddle RHOS_TRUNK-15.0-RHEL-8-20190509.n.1 puddle How reproducible: Always Steps to Reproduce: 1. Deploy undercloud using tripleo 2. Deploy overcloud using tripleo Actual results: fatal: [overcloud-novacomputeppc64le-0]: FAILED! => { "failed_when_result": true, "outputs.stdout_lines | default([]) | union(outputs.stderr_lines | default([]))": [ "Error running ['podman', 'run', '--name', 'nova_cell_v2_discover_hosts', '--label', 'config_id=tripleo_step5', '--label', 'container_name=nova_cell_v2_discover_hosts', '--label', 'managed_by=paunch', '--label', 'config_data={\"command\": \"/usr/bin/bootstrap_host_exec nova_compute su nova -s /bin/bash -c \\'/container-config-scripts/pyshim.sh /container-config-scripts/nova_cell_v2_discover_hosts.py\\'\", \"detach\": false, \"environment\": [\"TRIPLEO_DEPLOY_IDENTIFIER=1557430479\", \"__OS_DEBUG=true\"], \"image\": \"brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-nova-compute:15.0\", \"net\": \"host\", \"start_order\": 0, \"user\": \"root\", \"volumes\": [\"/etc/hosts:/etc/hosts:ro\", \"/etc/localtime:/etc/localtime:ro\", \"/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro\", \"/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro\", \"/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro\", \"/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro\", \"/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro\", \"/dev/log:/dev/log\", \"/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro\", \"/etc/puppet:/etc/puppet:ro\", \"/var/lib/config-data/nova_libvirt/etc/my.cnf.d/:/etc/my.cnf.d/:ro\", \"/var/lib/config-data/nova_libvirt/etc/nova/:/etc/nova/:ro\", \"/var/log/containers/nova:/var/log/nova\", \"/var/lib/container-config-scripts/:/container-config-scripts/\"]}', '--conmon-pidfile=/var/run/nova_cell_v2_discover_hosts.pid', '--log-driver', 'json-file', '--log-opt', 'path=/var/log/containers/stdouts/nova_cell_v2_discover_hosts.log', '--env=TRIPLEO_DEPLOY_IDENTIFIER=1557430479', '--env=__OS_DEBUG=true', '--net=host', '--user=root', '--volume=/etc/hosts:/etc/hosts:ro', '--volume=/etc/localtime:/etc/localtime:ro', '--volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro', '--volume=/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro', '--volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro', '--volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro', '--volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro', '--volume=/dev/log:/dev/log', '--volume=/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro', '--volume=/etc/puppet:/etc/puppet:ro', '--volume=/var/lib/config-data/nova_libvirt/etc/my.cnf.d/:/etc/my.cnf.d/:ro', '--volume=/var/lib/config-data/nova_libvirt/etc/nova/:/etc/nova/:ro', '--volume=/var/log/containers/nova:/var/log/nova', '--volume=/var/lib/container-config-scripts/:/container-config-scripts/', 'brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-nova-compute:15.0', '/usr/bin/bootstrap_host_exec', 'nova_compute', 'su', 'nova', '-s', '/bin/bash', '-c', \"'/container-config-scripts/pyshim.sh\", \"/container-config-scripts/nova_cell_v2_discover_hosts.py'\"]. [1]", "", "stdout: ", "stderr: su: cannot open session: Permission denied" ] } Expected results: Overcloud deploys without errors. Other info: Logs and template files available on request.