Description of problem: Overcloud deployment fails with container permissions error. Version-Release number of selected component (if applicable): RHOS_TRUNK-15.0-RHEL-8-20190509.n.0 puddle RHOS_TRUNK-15.0-RHEL-8-20190509.n.1 puddle How reproducible: Always Steps to Reproduce: 1. Deploy undercloud using tripleo 2. Deploy overcloud using tripleo Actual results: fatal: [overcloud-novacomputeppc64le-0]: FAILED! => { "failed_when_result": true, "outputs.stdout_lines | default([]) | union(outputs.stderr_lines | default([]))": [ "Error running ['podman', 'run', '--name', 'nova_cell_v2_discover_hosts', '--label', 'config_id=tripleo_step5', '--label', 'container_name=nova_cell_v2_discover_hosts', '--label', 'managed_by=paunch', '--label', 'config_data={\"command\": \"/usr/bin/bootstrap_host_exec nova_compute su nova -s /bin/bash -c \\'/container-config-scripts/pyshim.sh /container-config-scripts/nova_cell_v2_discover_hosts.py\\'\", \"detach\": false, \"environment\": [\"TRIPLEO_DEPLOY_IDENTIFIER=1557430479\", \"__OS_DEBUG=true\"], \"image\": \"brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-nova-compute:15.0\", \"net\": \"host\", \"start_order\": 0, \"user\": \"root\", \"volumes\": [\"/etc/hosts:/etc/hosts:ro\", \"/etc/localtime:/etc/localtime:ro\", \"/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro\", \"/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro\", \"/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro\", \"/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro\", \"/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro\", \"/dev/log:/dev/log\", \"/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro\", \"/etc/puppet:/etc/puppet:ro\", \"/var/lib/config-data/nova_libvirt/etc/my.cnf.d/:/etc/my.cnf.d/:ro\", \"/var/lib/config-data/nova_libvirt/etc/nova/:/etc/nova/:ro\", \"/var/log/containers/nova:/var/log/nova\", \"/var/lib/container-config-scripts/:/container-config-scripts/\"]}', '--conmon-pidfile=/var/run/nova_cell_v2_discover_hosts.pid', '--log-driver', 'json-file', '--log-opt', 'path=/var/log/containers/stdouts/nova_cell_v2_discover_hosts.log', '--env=TRIPLEO_DEPLOY_IDENTIFIER=1557430479', '--env=__OS_DEBUG=true', '--net=host', '--user=root', '--volume=/etc/hosts:/etc/hosts:ro', '--volume=/etc/localtime:/etc/localtime:ro', '--volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro', '--volume=/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro', '--volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro', '--volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro', '--volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro', '--volume=/dev/log:/dev/log', '--volume=/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro', '--volume=/etc/puppet:/etc/puppet:ro', '--volume=/var/lib/config-data/nova_libvirt/etc/my.cnf.d/:/etc/my.cnf.d/:ro', '--volume=/var/lib/config-data/nova_libvirt/etc/nova/:/etc/nova/:ro', '--volume=/var/log/containers/nova:/var/log/nova', '--volume=/var/lib/container-config-scripts/:/container-config-scripts/', 'brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-nova-compute:15.0', '/usr/bin/bootstrap_host_exec', 'nova_compute', 'su', 'nova', '-s', '/bin/bash', '-c', \"'/container-config-scripts/pyshim.sh\", \"/container-config-scripts/nova_cell_v2_discover_hosts.py'\"]. [1]", "", "stdout: ", "stderr: su: cannot open session: Permission denied" ] } Expected results: Overcloud deploys without errors. Other info: Logs and template files available on request.
Could this be run again in permissive mode, and then attach the audit.log file to this bug? Thank you.
Created attachment 1577155 [details] selinux audit log audit log from tripleo install in selinux permissive mode. I believe this is the oldest one, and should contain the audits produced during the tripleo overcloud installation.
To clarify a bit further, all of the deployments do far have been run in permissive mode, and we are still seeing these errors. It is not entirely clear to us whether this is selinux or some other permissions issue. Thanks!
Thank you for the file and additional context. Do you mean the deployment fails with the "Permission Denied" error even with SELinux set to permissive? There are some AVC denials related to dbus sockets and containers that may be of interest, e.g. type=AVC msg=audit(1559585457.605:2442): avc: denied { connectto } for pid=40544 comm="sudo" path="/run/dbus/system_bus_socket" scontext=system_u:system_r:container_t:s0:c432,c848 tcontext=system_u:system_r:system_dbusd_t:s0-s0:c0.c1023 tclass=unix_stream_socket permissive=1 type=AVC msg=audit(1559585460.222:2453): avc: denied { connectto } for pid=40825 comm="sudo" path="/run/dbus/system_bus_socket" scontext=system_u:system_r:container_t:s0:c16,c110 tcontext=system_u:system_r:system_dbusd_t:s0-s0:c0.c1023 tclass=unix_stream_socket permissive=1 and a bunch of other denials related to rhscmcertd and dnf/rpms that may not be, if we are using internal puddles? type=AVC msg=audit(1559596349.030:14473): avc: denied { open } for pid=321713 comm="rhsmcertd-worke" path="/var/cache/dnf/metadata_lock.pid" dev="dm-0" ino=107977532 scontext=system_u:system_r:rhsmcertd_t:s0 tcontext=system_u:object_r:rpm_var_cache_t:s0 tclass=file permissive=1 type=AVC msg=audit(1559596349.203:14474): avc: denied { open } for pid=321713 comm="rhsmcertd-worke" path="/var/cache/dnf/rhel-8-for-x86_64-appstream-rpms-9d3886b51bb367d7/repodata/repomd.xml" dev="dm-0" ino=40090136 scontext=system_u:system_r:rhsmcertd_t:s0 tcontext=unconfined_u:object_r:rpm_var_cache_t:s0 tclass=file permissive=1 Are the overcloud deployment log files also available, to pinpoint the full failure error message?
Although there are SELinux-related messages we'll likely want to clean up in the future, this is not the source of the current problem because the failure still occurs when running in permissive mode. (undercloud) [stack@director-ci07 ~]$ getenforce Permissive [heat-admin@overcloud-novacomputeppc64le-0 ~]$ getenforce Permissive On the compute node, I found this message in /var/log/secure-20190609 that seems to match the failure timestamp for the error in comment 0: Jun 3 19:44:18 overcloud-novacomputeppc64le-0 su[38956]: pam_limits(su:session): Could not set limit for 'memlock': Operation not permitted I'm not sure if this is related to the container nova_cell_v2_discover_hosts in particular or if it's something more generic so, moving to DFG Compute for further investigation at this point. The reporter has a live environment available for debugging and indicated that the problem started occurring sometime between puddles 2019-05-07 (last known success with tag 2019-05-07 15:38:36) and 2019-05-09 (when it was definitely failing).
This could be an ulimit issue. Can this be reproduced outside the deploy, just by running: $ podman exec -it -u root nova_compute su nova -s /bin/bash
>This could be an ulimit issue. Can this be reproduced outside the deploy, just by running: > >$ podman exec -it -u root nova_compute su nova -s /bin/bash The container seems to run alright when podman is executed from root, but the container is not visible to the heat-admin user. ``` [heat-admin@overcloud-novacomputeppc64le-0 ~]$ podman exec -it -u root nova_compute su nova -s /bin/bash unable to exec into nova_compute: no container with name or ID nova_compute found: no such container [heat-admin@overcloud-novacomputeppc64le-0 ~]$ sudo podman exec -it -u root nova_compute su nova -s /bin/bash ()[nova@overcloud-novacomputeppc64le-0 /]$ ```
(In reply to Martin Schuppert from comment #8) > This could be an ulimit issue. Can this be reproduced outside the deploy, > just by running: > > $ podman exec -it -u root nova_compute su nova -s /bin/bash That works for me: [stack@director ~]$ sudo podman exec -it -u root nova_compute su nova -s /bin/bash ()[nova@director /]$ exit The "good" news is I'm now hitting this too ;P
(In reply to Tony Breeds from comment #10) > (In reply to Martin Schuppert from comment #8) > > This could be an ulimit issue. Can this be reproduced outside the deploy, > > just by running: > > > > $ podman exec -it -u root nova_compute su nova -s /bin/bash > > That works for me: > [stack@director ~]$ sudo podman exec -it -u root nova_compute su nova -s > /bin/bash > ()[nova@director /]$ exit > > The "good" news is I'm now hitting this too ;P And of course it "worked" because I ran it on the director not the ppc64le compute node :(
Looks like it is ulimits [root@overcloud-novacomputeppc64le-0 ~]# bash /tmp/1709564_repro.sh [5/1896]core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 425883 max locked memory (kbytes, -l) 16384 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 425883 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited + podman rm nova_cell_v2_discover_hosts1 57f74e9f6c451821402768cf6a6602b79f977a0cb9644b26b550fc2bf620f74c + podman run --name nova_cell_v2_discover_hosts1 --label config_id=tripleo_step5 --label container_name=nova_cell_v2_discover_hosts --label managed_by=paunch --label ''\''config_data={"command": "/usr/bin/bootstra p_host_exec nova_compute su nova -s /bin/bash -c \'\''/container-config-scripts/pyshim.sh /container-config-scripts/nova_cell_v2_discover_hosts.py\'\''", "detach": false, "environment": ["TRIPLEO_DEPLOY_IDENTIFIER =1560841493", "__OS_DEBUG=true"], "image": "brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-nova-compute:15.0-44", "net": "host", "start_order": 0, "user": "root", "volumes": ["/etc/hosts:/e tc/hosts:ro", "/etc/localtime:/etc/localtime:ro", "/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro", "/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro", "/etc/pki/tls/certs/ca-bundle .crt:/etc/pki/tls/certs/ca-bundle.crt:ro", "/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro", "/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro", "/dev/log:/dev/log", "/etc/ssh/ssh_k nown_hosts:/etc/ssh/ssh_known_hosts:ro", "/etc/puppet:/etc/puppet:ro", "/var/lib/config-data/nova_libvirt/etc/my.cnf.d/:/etc/my.cnf.d/:ro", "/var/lib/config-data/nova_libvirt/etc/nova/:/etc/nova/:ro", "/var/log/co ntainers/nova:/var/log/nova", "/var/lib/container-config-scripts/:/container-config-scripts/"]}'\''' --conmon-pidfile=/var/run/nova_cell_v2_discover_hosts.pid --log-driver json-file --log-opt path=/var/log/contain ers/stdouts/nova_cell_v2_discover_hosts.log --env=TRIPLEO_DEPLOY_IDENTIFIER=1560841493 --env=__OS_DEBUG=true --net=host --user=root --volume=/etc/hosts:/etc/hosts:ro --volume=/etc/localtime:/etc/localtime:ro --vol ume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro --volume=/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro --volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.c rt:ro --volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro --volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro --volume=/dev/log:/dev/log --volume=/etc/ssh/ssh_known_hosts: /etc/ssh/ssh_known_hosts:ro --volume=/etc/puppet:/etc/puppet:ro --volume=/var/lib/config-data/nova_libvirt/etc/my.cnf.d/:/etc/my.cnf.d/:ro --volume=/var/lib/config-data/nova_libvirt/etc/nova/:/etc/nova/:ro --volum e=/var/log/containers/nova:/var/log/nova --volume=/var/lib/container-config-scripts/:/container-config-scripts/ brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-nova-compute:15.0-44 /usr/bin/ bootstrap_host_exec nova_compute su nova -s /bin/bash -c '/container-config-scripts/pyshim.sh /container-config-scripts/nova_cell_v2_discover_hosts.py' su: cannot open session: Permission denied + ulimit -l unlimited + ulimit -a core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 425883 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 425883 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited + podman rm nova_cell_v2_discover_hosts1 99aeef0b277088a9d2cc2aa3ebb4cda9f86c42bc18981cc637ec3c2736c59a41 + podman run --name nova_cell_v2_discover_hosts1 --label config_id=tripleo_step5 --label container_name=nova_cell_v2_discover_hosts --label managed_by=paunch --label ''\''config_data={"command": "/usr/bin/bootstra p_host_exec nova_compute su nova -s /bin/bash -c \'\''/container-config-scripts/pyshim.sh /container-config-scripts/nova_cell_v2_discover_hosts.py\'\''", "detach": false, "environment": ["TRIPLEO_DEPLOY_IDENTIFIER =1560841493", "__OS_DEBUG=true"], "image": "brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-nova-compute:15.0-44", "net": "host", "start_order": 0, "user": "root", "volumes": ["/etc/hosts:/e tc/hosts:ro", "/etc/localtime:/etc/localtime:ro", "/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro", "/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro", "/etc/pki/tls/certs/ca-bundle .crt:/etc/pki/tls/certs/ca-bundle.crt:ro", "/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro", "/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro", "/dev/log:/dev/log", "/etc/ssh/ssh_k nown_hosts:/etc/ssh/ssh_known_hosts:ro", "/etc/puppet:/etc/puppet:ro", "/var/lib/config-data/nova_libvirt/etc/my.cnf.d/:/etc/my.cnf.d/:ro", "/var/lib/config-data/nova_libvirt/etc/nova/:/etc/nova/:ro", "/var/log/co ntainers/nova:/var/log/nova", "/var/lib/container-config-scripts/:/container-config-scripts/"]}'\''' --conmon-pidfile=/var/run/nova_cell_v2_discover_hosts.pid --log-driver json-file --log-opt path=/var/log/contain ers/stdouts/nova_cell_v2_discover_hosts.log --env=TRIPLEO_DEPLOY_IDENTIFIER=1560841493 --env=__OS_DEBUG=true --net=host --user=root --volume=/etc/hosts:/etc/hosts:ro --volume=/etc/localtime:/etc/localtime:ro --vol ume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro --volume=/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro --volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.c rt:ro --volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro --volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro --volume=/dev/log:/dev/log --volume=/etc/ssh/ssh_known_hosts: /etc/ssh/ssh_known_hosts:ro --volume=/etc/puppet:/etc/puppet:ro --volume=/var/lib/config-data/nova_libvirt/etc/my.cnf.d/:/etc/my.cnf.d/:ro --volume=/var/lib/config-data/nova_libvirt/etc/nova/:/etc/nova/:ro --volum e=/var/log/containers/nova:/var/log/nova --volume=/var/lib/container-config-scripts/:/container-config-scripts/ brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-nova-compute:15.0-44 /usr/bin/ bootstrap_host_exec nova_compute su nova -s /bin/bash -c '/container-config-scripts/pyshim.sh /container-config-scripts/nova_cell_v2_discover_hosts.py' Usage: pyshim.sh <script and/or arguments>
thx for confirming that its the ulimit issue and the submitted patches!
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:2811