Description of problem: neutron-dhcp fail to spawn DHCP process for ctlplane network on the undercloud. Version-Release number of selected component (if applicable): podman-1.0.5-1.gitf604175.module+el8.0.0+4017+bbba319f.x86_64 REPOSITORY TAG IMAGE ID /rhosp15/openstack-neutron-dhcp-agent 20190904.3 4ba2d12e64e3 How reproducible: Intermittent. Have only seen first occurance. Steps to Reproduce: 1. Deploy undercloud - all is OK. 2. Reboot undercloud - neutron dhcp namespace fails. 3. Actual results: 2019-09-11 16:12:11.498 117735 DEBUG neutron.agent.linux.dhcp [-] Spawning DHCP process for network c272c5bf-5a6b-4629-91de-f12bd532f230 failed; Error: Exit code: 126; Stdin: ; Stdout: Removing orphaned container fae7546af89d fae7546af89dacb21833676bb765d027b4dee9d880871cbc5d6c71977064c1c5 Starting a new child container neutron-dnsmasq-qdhcp-c272c5bf-5a6b-4629-91de-f12bd532f230 ; Stderr: + export DOCKER_HOST= + DOCKER_HOST= + ARGS='--no-hosts --no-resolv --pid-file=/var/lib/neutron/dhcp/c272c5bf-5a6b-4629-91de-f12bd532f230/pid --dhcp-hostsfile=/var/lib/neutron/dhcp/c272c5bf-5a6b-4629-91de-f12bd532f230/host --addn-hosts=/var/lib/neutron/dhcp/c272c5bf-5a6b-462 9-91de-f12bd532f230/addn_hosts --dhcp-optsfile=/var/lib/neutron/dhcp/c272c5bf-5a6b-4629-91de-f12bd532f230/opts --dhcp-leasefile=/var/lib/neutron/dhcp/c272c5bf-5a6b-4629-91de-f12bd532f230/leases --dhcp-match=set:ipxe,175 --dhcp-userclass=s et:ipxe6,iPXE --local-service --bind-dynamic --dhcp-range=set:tag0,192.168.24.0,static,255.255.255.0,86400s --dhcp-option-force=option:mtu,1500 --dhcp-lease-max=256 --conf-file= --domain=localdomain' ++ ip netns identify + NETNS=qdhcp-c272c5bf-5a6b-4629-91de-f12bd532f230 + NAME=neutron-dnsmasq-qdhcp-c272c5bf-5a6b-4629-91de-f12bd532f230 + CLI='nsenter --net=/run/netns/qdhcp-c272c5bf-5a6b-4629-91de-f12bd532f230 --preserve-credentials -m -t 1 podman' + LOGGING='--log-driver json-file --log-opt path=/var/log/containers/stdouts/neutron-dnsmasq-qdhcp-c272c5bf-5a6b-4629-91de-f12bd532f230.log' + CMD='/usr/sbin/dnsmasq -k' ++ nsenter --net=/run/netns/qdhcp-c272c5bf-5a6b-4629-91de-f12bd532f230 --preserve-credentials -m -t 1 podman ps -a --filter name=neutron-dnsmasq- --format '{{.ID}}:{{.Names}}:{{.Status}}' ++ awk '{print $1}' + LIST=fae7546af89d:neutron-dnsmasq-qdhcp-c272c5bf-5a6b-4629-91de-f12bd532f230:Created ++ printf '%s\n' fae7546af89d:neutron-dnsmasq-qdhcp-c272c5bf-5a6b-4629-91de-f12bd532f230:Created ++ grep -E ':(Exited|Created)' + ORPHANTS=fae7546af89d:neutron-dnsmasq-qdhcp-c272c5bf-5a6b-4629-91de-f12bd532f230:Created + '[' -n fae7546af89d:neutron-dnsmasq-qdhcp-c272c5bf-5a6b-4629-91de-f12bd532f230:Created ']' ++ printf '%s\n' fae7546af89d:neutron-dnsmasq-qdhcp-c272c5bf-5a6b-4629-91de-f12bd532f230:Created ++ awk -F: '{print $1}' + for orphant in $(printf "%s\n" "${ORPHANTS}" | awk -F':' '{print $1}') + echo 'Removing orphaned container fae7546af89d' + nsenter --net=/run/netns/qdhcp-c272c5bf-5a6b-4629-91de-f12bd532f230 --preserve-credentials -m -t 1 podman stop fae7546af89d can only stop created, running, or stopped containers: container state improper + true + nsenter --net=/run/netns/qdhcp-c272c5bf-5a6b-4629-91de-f12bd532f230 --preserve-credentials -m -t 1 podman rm -f fae7546af89d + printf '%s\n' fae7546af89d:neutron-dnsmasq-qdhcp-c272c5bf-5a6b-4629-91de-f12bd532f230:Created + grep -q 'neutron-dnsmasq-qdhcp-c272c5bf-5a6b-4629-91de-f12bd532f230$' + echo 'Starting a new child container neutron-dnsmasq-qdhcp-c272c5bf-5a6b-4629-91de-f12bd532f230' + nsenter --net=/run/netns/qdhcp-c272c5bf-5a6b-4629-91de-f12bd532f230 --preserve-credentials -m -t 1 podman run --detach --log-driver json-file --log-opt path=/var/log/containers/stdouts/neutron-dnsmasq-qdhcp-c272c5bf-5a6b-4629-91de-f12bd 532f230.log -v /var/lib/config-data/puppet-generated/neutron/etc/neutron:/etc/neutron:ro -v /run/netns:/run/netns:shared -v /var/lib/neutron:/var/lib/neutron:z,shared -v /dev/log:/dev/log --net host --pid host --privileged -u root --name neutron-dnsmasq-qdhcp-c272c5bf-5a6b-4629-91de-f12bd532f230 192.168.24.1:8787/rhosp15/openstack-neutron-dhcp-agent:20190904.3 /usr/sbin/dnsmasq -k --no-hosts --no-resolv --pid-file=/var/lib/neutron/dhcp/c272c5bf-5a6b-4629-91de-f12bd532f230 /pid --dhcp-hostsfile=/var/lib/neutron/dhcp/c272c5bf-5a6b-4629-91de-f12bd532f230/host --addn-hosts=/var/lib/neutron/dhcp/c272c5bf-5a6b-4629-91de-f12bd532f230/addn_hosts --dhcp-optsfile=/var/lib/neutron/dhcp/c272c5bf-5a6b-4629-91de-f12bd53 2f230/opts --dhcp-leasefile=/var/lib/neutron/dhcp/c272c5bf-5a6b-4629-91de-f12bd532f230/leases --dhcp-match=set:ipxe,175 --dhcp-userclass=set:ipxe6,iPXE --local-service --bind-dynamic --dhcp-range=set:tag0,192.168.24.0,static,255.255.255.0 ,86400s --dhcp-option-force=option:mtu,1500 --dhcp-lease-max=256 --conf-file= --domain=localdomain container create failed: container_linux.go:345: starting container process caused "process_linux.go:430: container init caused \"write /proc/self/attr/keycreate: permission denied\"" : internal libpod error _enable /usr/lib/python3.6/site-packages/neutron/agent/linux/dhcp.py:235 Expected results: neutron-dhcp should successfully spawn DHCP process. Additional info:
*** Bug 1750966 has been marked as a duplicate of this bug. ***
So nodes provisioning and cleaning will fail due to this issue. To reproduce - simply try to deploy overcloud or to clean nodes in undercloud. Seems like constantly reproduces.
Not sure of this libpod error is specific to the neutron-dhcp-agent or to podman. Using: podman-1.0.5-1.gitf604175.module+el8.0.0+4017+bbba319f.x86_64 Including networking DFG in case they have seen this.
Created attachment 1614258 [details] neutron dhcp-agent.log
This may be selinux related, in the logs captured on Bz1750966 I see these in /undercloud-0/var/log/audit/audit.log: type=AVC msg=audit(1568145299.277:2945): avc: denied { create } for pid=77464 comm="runc:[2:INIT]" scontext=system_u:system_r:spc_t:s0 tcontext=system_u:object_r:unlabeled_t:s0 tclass=key permissive=0
$ sudo tail -f /var/log/audit/audit.log | grep 'avc: denied' << quiet no errors >> Then run: $ systemctl restart tripleo_neutron_dhcp.service A few seconds later these show up: type=AVC msg=audit(1568239901.789:19443): avc: denied { create } for pid=186828 comm="runc:[2:INIT]" scontext=system_u:system_r:spc_t:s0 tcontext=system_u:object_r:unlabeled_t:s0 tclass=key permissive=0 type=AVC msg=audit(1568239903.598:19446): avc: denied { create } for pid=186993 comm="runc:[2:INIT]" scontext=system_u:system_r:spc_t:s0 tcontext=system_u:object_r:unlabeled_t:s0 tclass=key permissive=0 type=AVC msg=audit(1568239905.370:19451): avc: denied { create } for pid=187192 comm="runc:[2:INIT]" scontext=system_u:system_r:spc_t:s0 tcontext=system_u:object_r:unlabeled_t:s0 tclass=key permissive=0 type=AVC msg=audit(1568239907.174:19452): avc: denied { create } for pid=187343 comm="runc:[2:INIT]" scontext=system_u:system_r:spc_t:s0 tcontext=system_u:object_r:unlabeled_t:s0 tclass=key permissive=0 Workaround: $ setenforce permissive && systemctl restart tripleo_neutron_dhcp.service The neutron DHCP server is up: [stack@undercloud-0 ~]$ ps aux | grep dnsmasq | grep neutron root 200345 0.0 0.0 85976 1816 ? Ssl 22:15 0:00 /usr/libexec/podman/conmon -s -c aa908de07fd88b7c37e4d2d9715e94f1ff51706ab62a93631029cffe673e53da -u aa908de07fd88b7c37e4d2d9715e94f1ff51706ab62a93631029cffe673e53da -r /usr/bin/runc -b /var/lib/containers/storage/overlay-containers/aa908de07fd88b7c37e4d2d9715e94f1ff51706ab62a93631029cffe673e53da/userdata -p /var/run/containers/storage/overlay-containers/aa908de07fd88b7c37e4d2d9715e94f1ff51706ab62a93631029cffe673e53da/userdata/pidfile -l /var/log/containers/stdouts/neutron-dnsmasq-qdhcp-ad57e457-9aaf-4aed-8136-2f9e16583958.log --exit-dir /var/run/libpod/exits --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /var/lib/containers/storage --exit-command-arg --runroot --exit-command-arg /var/run/containers/storage --exit-command-arg --log-level --exit-command-arg error --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /var/run/libpod --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg container --exit-command-arg cleanup --exit-command-arg aa908de07fd88b7c37e4d2d9715e94f1ff51706ab62a93631029cffe673e53da --socket-dir-path /var/run/libpod/socket --log-level error root 200358 0.0 0.0 4208 820 ? Ss 22:15 0:00 dumb-init --single-child -- /usr/sbin/dnsmasq -k --no-hosts --no-resolv --pid-file=/var/lib/neutron/dhcp/ad57e457-9aaf-4aed-8136-2f9e16583958/pid --dhcp-hostsfile=/var/lib/neutron/dhcp/ad57e457-9aaf-4aed-8136-2f9e16583958/host --addn-hosts=/var/lib/neutron/dhcp/ad57e457-9aaf-4aed-8136-2f9e16583958/addn_hosts --dhcp-optsfile=/var/lib/neutron/dhcp/ad57e457-9aaf-4aed-8136-2f9e16583958/opts --dhcp-leasefile=/var/lib/neutron/dhcp/ad57e457-9aaf-4aed-8136-2f9e16583958/leases --dhcp-match=set:ipxe,175 --dhcp-userclass=set:ipxe6,iPXE --local-service --bind-dynamic --dhcp-range=set:tag0,192.168.24.0,static,255.255.255.0,86400s --dhcp-option-force=option:mtu,1500 --dhcp-lease-max=256 --conf-file= --domain=localdomain insights 200373 0.6 0.0 56868 4532 ? S 22:15 0:00 /usr/sbin/dnsmasq -k --no-hosts --no-resolv --pid-file=/var/lib/neutron/dhcp/ad57e457-9aaf-4aed-8136-2f9e16583958/pid --dhcp-hostsfile=/var/lib/neutron/dhcp/ad57e457-9aaf-4aed-8136-2f9e16583958/host --addn-hosts=/var/lib/neutron/dhcp/ad57e457-9aaf-4aed-8136-2f9e16583958/addn_hosts --dhcp-optsfile=/var/lib/neutron/dhcp/ad57e457-9aaf-4aed-8136-2f9e16583958/opts --dhcp-leasefile=/var/lib/neutron/dhcp/ad57e457-9aaf-4aed-8136-2f9e16583958/leases --dhcp-match=set:ipxe,175 --dhcp-userclass=set:ipxe6,iPXE --local-service --bind-dynamic --dhcp-range=set:tag0,192.168.24.0,static,255.255.255.0,86400s --dhcp-option-force=option:mtu,1500 --dhcp-lease-max=256 --conf-file= --domain=localdomain
it sounds like some labelling issue, and this rule would help: allow spc_t unlabeled_t:key create; Note that the rule is allowed on my laptop, I wonder if we need a new selinux or something. Can you please run: grep AVC /var/log/audit/audit.log | audit2allow -m spc_t > spc_t.te On my laptop, it produce: cat spc_t.t: module spc_t 1.0; require { type spc_t; type unlabeled_t; class key create; } #============= spc_t ============== #!!!! This avc is allowed in the current policy allow spc_t unlabeled_t:key create; Please run: grep AVC /var/log/audit/audit.log | audit2allow -m spc_t And reboot again, see if it helped.
Changing component as this looks like selinux issue.
(In reply to Emilien Macchi from comment #9) > it sounds like some labelling issue, and this rule would help: > > allow spc_t unlabeled_t:key create; > > Note that the rule is allowed on my laptop, I wonder if we need a new > selinux or something. > Can you please run: > > grep AVC /var/log/audit/audit.log | audit2allow -m spc_t > spc_t.te > [root@undercloud-0 ~]# cat spc_t.te module spc_t 1.0; require { type spc_t; type system_dbusd_t; type container_t; type unlabeled_t; class dbus send_msg; class key create; } #============= container_t ============== allow container_t system_dbusd_t:dbus send_msg; #============= spc_t ============== allow spc_t unlabeled_t:key create; > > Please run: > grep AVC /var/log/audit/audit.log | audit2allow -m spc_t > > And reboot again, see if it helped. [root@undercloud-0 ~]# grep AVC /var/log/audit/audit.log | audit2allow -M spc_t ******************** IMPORTANT *********************** To make this policy package active, execute: semodule -i spc_t.pp [root@undercloud-0 ~]# semodule -i spc_t.pp [root@undercloud-0 ~]# systemctl restart tripleo_neutron_dhcp.service [root@undercloud-0 ~]# ps aux | grep dnsmasq | grep neutron root 45416 0.0 0.0 85976 1828 ? Ssl 23:43 0:00 /usr/libexec/podman/conmon -s -c de74cc776c21899c7226f25e746c8cd93c7685bbb0b1e42d07e514eceb7aef27 -u de74cc776c21899c7226f25e746c8cd93c7685bbb0b1e42d07e514eceb7aef27 -r /usr/bin/runc -b /var/lib/containers/storage/overlay-containers/de74cc776c21899c7226f25e746c8cd93c7685bbb0b1e42d07e514eceb7aef27/userdata -p /var/run/containers/storage/overlay-containers/de74cc776c21899c7226f25e746c8cd93c7685bbb0b1e42d07e514eceb7aef27/userdata/pidfile -l /var/log/containers/stdouts/neutron-dnsmasq-qdhcp-ad57e457-9aaf-4aed-8136-2f9e16583958.log --exit-dir /var/run/libpod/exits --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /var/lib/containers/storage --exit-command-arg --runroot --exit-command-arg /var/run/containers/storage --exit-command-arg --log-level --exit-command-arg error --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /var/run/libpod --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg container --exit-command-arg cleanup --exit-command-arg de74cc776c21899c7226f25e746c8cd93c7685bbb0b1e42d07e514eceb7aef27 --socket-dir-path /var/run/libpod/socket --log-level error root 45428 0.1 0.0 4208 840 ? Ss 23:43 0:00 dumb-init --single-child -- /usr/sbin/dnsmasq -k --no-hosts --no-resolv --pid-file=/var/lib/neutron/dhcp/ad57e457-9aaf-4aed-8136-2f9e16583958/pid --dhcp-hostsfile=/var/lib/neutron/dhcp/ad57e457-9aaf-4aed-8136-2f9e16583958/host --addn-hosts=/var/lib/neutron/dhcp/ad57e457-9aaf-4aed-8136-2f9e16583958/addn_hosts --dhcp-optsfile=/var/lib/neutron/dhcp/ad57e457-9aaf-4aed-8136-2f9e16583958/opts --dhcp-leasefile=/var/lib/neutron/dhcp/ad57e457-9aaf-4aed-8136-2f9e16583958/leases --dhcp-match=set:ipxe,175 --dhcp-userclass=set:ipxe6,iPXE --local-service --bind-dynamic --dhcp-range=set:tag0,192.168.24.0,static,255.255.255.0,86400s --dhcp-option-force=option:mtu,1500 --dhcp-lease-max=256 --conf-file= --domain=localdomain insights 45443 5.0 0.0 56868 4640 ? S 23:43 0:00 /usr/sbin/dnsmasq -k --no-hosts --no-resolv --pid-file=/var/lib/neutron/dhcp/ad57e457-9aaf-4aed-8136-2f9e16583958/pid --dhcp-hostsfile=/var/lib/neutron/dhcp/ad57e457-9aaf-4aed-8136-2f9e16583958/host --addn-hosts=/var/lib/neutron/dhcp/ad57e457-9aaf-4aed-8136-2f9e16583958/addn_hosts --dhcp-optsfile=/var/lib/neutron/dhcp/ad57e457-9aaf-4aed-8136-2f9e16583958/opts --dhcp-leasefile=/var/lib/neutron/dhcp/ad57e457-9aaf-4aed-8136-2f9e16583958/leases --dhcp-match=set:ipxe,175 --dhcp-userclass=set:ipxe6,iPXE --local-service --bind-dynamic --dhcp-range=set:tag0,192.168.24.0,static,255.255.255.0,86400s --dhcp-option-force=option:mtu,1500 --dhcp-lease-max=256 --conf-file= --domain=localdomain
It looks like the upstream fix was in container-selinux 2.100 (or maybe 2.109, looking at the tag): https://github.com/containers/container-selinux/commit/3b7818. I'll add the rule from comment 9 to openstack-selinux in the meantime.
Would it be possible to get a copy of the audit.log file when the command was run in permissive mode, to make sure we're not missing anything? Thank you.
Created attachment 1614449 [details] Audit log's from reproducer - was running in permissive mode -> then rebooted -> then rules added to policy with audit2allow The attached file should have it all. I hope it's easy to see when things switch between permissive=0 and permissive=1 in the logs.
Hi, so I've applied that https://bugzilla.redhat.com/show_bug.cgi?id=1751559#c2 on an osp15 env during a "stuck" deployment and then it went on. I still have the env up and running if needed.
*** Bug 1751559 has been marked as a duplicate of this bug. ***
Thank you for the logs. I can only see the one spc_t / key create denial on the permissive run. The dbus denials appear unrelated, and didn't show up at all during the permissive run. I think PR #43 linked above will be enough to resolve the issue; Sofer is deploying another environment to confirm.
hi, so July and I deployed an osp15 undercloud RC-0.9 (with podmain 1.0.5 in it) and saw the overcloud deployment failure. We deleted the overcloud and applied that: [root@undercloud-0 policy]# cat local.te policy_module(local,1.1) gen_require(` type unlabeled_t; type spc_t; ') allow spc_t unlabeled_t:key manage_key_perms; then make -f /usr/share/selinux/devel/Makefile local.pp semodule -i local.pp we didn't restart any service and then retriggered an osp15 overcloud deployment, which went past the deployment of the server. (undercloud) [stack@undercloud-0 ~]$ openstack baremetal node list +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | 7dc5e609-6072-4b07-93d0-347f8fa8fef6 | compute-0 | 41d9d5c9-4d56-4278-9862-51a3b5f9a1de | power on | active | False | | 2c398bfb-faf3-4781-8957-c4c9c1d91c4e | controller-0 | 9a694ec5-d31a-4553-97e4-f36e51f188fa | power on | active | False | | 213558e2-1c9e-47ec-ba1a-0c2e3f301932 | controller-1 | 81bf05a0-9589-485c-a2f3-d20928cf39f0 | power on | active | False | | 42445b7b-f53e-4bf4-9cb8-45bdaf963302 | controller-2 | fd3ec193-e863-4419-b899-d94e82fa224a | power on | active | False | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ Thanks,
This is fixed by an update to container-selinux.
Env: openstack-selinux-0.8.20-0.20190912133707.089066f.el8ost.noarch I re-tested this with latest compose (RHOS_TRUNK-15.0-RHEL-8-20190913.n.3) and no longer see the issue. All nodes deployed
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:2811