Description of problem: On the undercloud some of the neutron containers(neutron_ovs_agent,neutron_l3_agent,neutron_dhcp) fail to start at boot time because they cannot mount /run/openvswitch. [root@undercloud stack]# journalctl -l -u tripleo_neutron_l3_agent.service | grep -i -B1 error Jan 16 14:44:06 undercloud.localdomain podman[7343]: unable to start container 550ab4d11bf342de083019709d55c45a2b157a7975614a340b6e112bd90e9d0a: container create failed: container_linux.go:336: starting container process caused "process_linux.go:399: container init caused \"rootfs_linux.go:58: mounting \\\"/run/openvswitch\\\" to rootfs \\\"/var/lib/containers/storage/overlay/133bf5712b528ae74a84cbf55db3cd28d0ca12eed2ebf1c1dbb68c69285778fb/merged\\\" at \\\"/run/openvswitch\\\" caused \\\"stat /run/openvswitch: no such file or directory\\\"\"" Jan 16 14:44:06 undercloud.localdomain podman[7343]: : internal libpod error Checking the openvswitch log we can see that it starts 1 minute later(the timezone difference is because the journal log is on EST timezone while the openvswitch is UTC): 2019-01-16T19:45:31.281Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovs-vswitchd.log Version-Release number of selected component (if applicable): openstack-tripleo-image-elements-10.1.1-0.20190111004154.99e6a5a.fc28.noarch openstack-tripleo-puppet-elements-10.0.1-0.20190108154243.bdf1104.fc28.noarch openstack-tripleo-common-10.3.1-0.20190115214116.f50b35e.fc28.noarch puppet-tripleo-10.2.1-0.20190115195112.a2c549a.fc28.noarch ansible-tripleo-ipsec-9.0.1-0.20190115094401.8b37e93.fc28.noarch openstack-tripleo-heat-templates-10.3.1-0.20190116115308.d747625.fc28.noarch python3-tripleoclient-heat-installer-11.2.1-0.20190116115554.72a9f50.fc28.noarch python3-tripleoclient-11.2.1-0.20190116115554.72a9f50.fc28.noarch python3-tripleo-common-10.3.1-0.20190115214116.f50b35e.fc28.noarch openstack-tripleo-validations-10.2.1-0.20190115104503.abdd15f.fc28.noarch openstack-tripleo-common-containers-10.3.1-0.20190115214116.f50b35e.fc28.noarch ansible-role-tripleo-modify-image-1.0.1-0.20190114124825.d67f1ef.fc28.noarch How reproducible: 100% Steps to Reproduce: 1. Install undercloud on RHEL8 2. Reboot the undercloud(include workaround for BZ#1666387) 3. Check status for the neutron_ovs_agent,neutron_l3_agent,neutron_dhcp containers Actual results: Not started Expected results: Started Additional info: The tripleo_neutron_l3_agent.service eventually ends in failed state(before the openvswitch service getting started). Restarting the service manually after openvswitch started works fine and gets the containers started. Jan 16 14:44:02 undercloud.localdomain systemd[1]: tripleo_neutron_l3_agent.service: Main process exited, code=exited, status=125/n/a Jan 16 14:44:02 undercloud.localdomain systemd[1]: tripleo_neutron_l3_agent.service: Failed with result 'exit-code'. Jan 16 14:44:02 undercloud.localdomain systemd[1]: tripleo_neutron_l3_agent.service: Service RestartSec=100ms expired, scheduling restart. Jan 16 14:44:02 undercloud.localdomain systemd[1]: tripleo_neutron_l3_agent.service: Scheduled restart job, restart counter is at 1. Jan 16 14:44:02 undercloud.localdomain systemd[1]: Stopped neutron_l3_agent container. Jan 16 14:44:02 undercloud.localdomain systemd[1]: Started neutron_l3_agent container. Jan 16 14:44:06 undercloud.localdomain podman[7343]: unable to start container 550ab4d11bf342de083019709d55c45a2b157a7975614a340b6e112bd90e9d0a: container create failed: container_linux.go:336: starting container process caused "process_> Jan 16 14:44:06 undercloud.localdomain podman[7343]: : internal libpod error Jan 16 14:44:06 undercloud.localdomain systemd[1]: tripleo_neutron_l3_agent.service: Main process exited, code=exited, status=125/n/a Jan 16 14:44:06 undercloud.localdomain systemd[1]: tripleo_neutron_l3_agent.service: Failed with result 'exit-code'. Jan 16 14:44:07 undercloud.localdomain systemd[1]: tripleo_neutron_l3_agent.service: Service RestartSec=100ms expired, scheduling restart. Jan 16 14:44:07 undercloud.localdomain systemd[1]: tripleo_neutron_l3_agent.service: Scheduled restart job, restart counter is at 2. Jan 16 14:44:07 undercloud.localdomain systemd[1]: Stopped neutron_l3_agent container. Jan 16 14:44:07 undercloud.localdomain systemd[1]: tripleo_neutron_l3_agent.service: Found left-over process 9284 (podman) in control group while starting unit. Ignoring. Jan 16 14:44:07 undercloud.localdomain systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Jan 16 14:44:07 undercloud.localdomain systemd[1]: Started neutron_l3_agent container. Jan 16 14:44:08 undercloud.localdomain systemd[1]: tripleo_neutron_l3_agent.service: Main process exited, code=exited, status=125/n/a Jan 16 14:44:08 undercloud.localdomain systemd[1]: tripleo_neutron_l3_agent.service: Failed with result 'exit-code'. Jan 16 14:44:09 undercloud.localdomain systemd[1]: tripleo_neutron_l3_agent.service: Service RestartSec=100ms expired, scheduling restart. Jan 16 14:44:09 undercloud.localdomain systemd[1]: tripleo_neutron_l3_agent.service: Scheduled restart job, restart counter is at 3. Jan 16 14:44:09 undercloud.localdomain systemd[1]: Stopped neutron_l3_agent container. Jan 16 14:44:09 undercloud.localdomain systemd[1]: Started neutron_l3_agent container. Jan 16 14:44:11 undercloud.localdomain systemd[1]: tripleo_neutron_l3_agent.service: Main process exited, code=exited, status=125/n/a Jan 16 14:44:11 undercloud.localdomain systemd[1]: tripleo_neutron_l3_agent.service: Failed with result 'exit-code'. Jan 16 14:44:11 undercloud.localdomain systemd[1]: tripleo_neutron_l3_agent.service: Service RestartSec=100ms expired, scheduling restart. Jan 16 14:44:11 undercloud.localdomain systemd[1]: tripleo_neutron_l3_agent.service: Scheduled restart job, restart counter is at 4. Jan 16 14:44:11 undercloud.localdomain systemd[1]: Stopped neutron_l3_agent container. Jan 16 14:44:11 undercloud.localdomain systemd[1]: Started neutron_l3_agent container. Jan 16 14:44:13 undercloud.localdomain systemd[1]: tripleo_neutron_l3_agent.service: Main process exited, code=exited, status=125/n/a Jan 16 14:44:13 undercloud.localdomain systemd[1]: tripleo_neutron_l3_agent.service: Failed with result 'exit-code'. Jan 16 14:44:13 undercloud.localdomain systemd[1]: tripleo_neutron_l3_agent.service: Service RestartSec=100ms expired, scheduling restart. Jan 16 14:44:13 undercloud.localdomain systemd[1]: tripleo_neutron_l3_agent.service: Scheduled restart job, restart counter is at 5. Jan 16 14:44:13 undercloud.localdomain systemd[1]: Stopped neutron_l3_agent container. Jan 16 14:44:13 undercloud.localdomain systemd[1]: Started neutron_l3_agent container. Jan 16 14:44:14 undercloud.localdomain systemd[1]: tripleo_neutron_l3_agent.service: Main process exited, code=exited, status=125/n/a Jan 16 14:44:14 undercloud.localdomain systemd[1]: tripleo_neutron_l3_agent.service: Failed with result 'exit-code'. Jan 16 14:44:14 undercloud.localdomain systemd[1]: tripleo_neutron_l3_agent.service: Service RestartSec=100ms expired, scheduling restart. Jan 16 14:44:14 undercloud.localdomain systemd[1]: tripleo_neutron_l3_agent.service: Scheduled restart job, restart counter is at 6. Jan 16 14:44:14 undercloud.localdomain systemd[1]: Stopped neutron_l3_agent container. Jan 16 14:44:14 undercloud.localdomain systemd[1]: Started neutron_l3_agent container. Jan 16 14:44:14 undercloud.localdomain systemd[1]: tripleo_neutron_l3_agent.service: Main process exited, code=exited, status=125/n/a Jan 16 14:44:14 undercloud.localdomain systemd[1]: tripleo_neutron_l3_agent.service: Failed with result 'exit-code'. Jan 16 14:44:15 undercloud.localdomain systemd[1]: tripleo_neutron_l3_agent.service: Service RestartSec=100ms expired, scheduling restart. Jan 16 14:44:15 undercloud.localdomain systemd[1]: tripleo_neutron_l3_agent.service: Scheduled restart job, restart counter is at 7. Jan 16 14:44:15 undercloud.localdomain systemd[1]: Stopped neutron_l3_agent container. Jan 16 14:44:15 undercloud.localdomain systemd[1]: tripleo_neutron_l3_agent.service: Start request repeated too quickly. Jan 16 14:44:15 undercloud.localdomain systemd[1]: tripleo_neutron_l3_agent.service: Failed with result 'exit-code'. Jan 16 14:44:15 undercloud.localdomain systemd[1]: Failed to start neutron_l3_agent container.
It sounds like a missing start dependency between openvswitch and neutron containers? With docker, the docker service is After=network.target and opensvswitch.service is PartOf=network.target. So docker containers always start after openvswitch
Brent, can you comment on whether this is more on the Director side or the OVS side? It seems to me that OVS would need to maintain their existing target for a generic situation, so a different specialized target may be needed.
This is on the deployment side. The service files responsible for launching the neutron agents should have dependencies to drag openvswitch up.