Description of problem: If neutron-*-agent child processes die, the agent's won't notice it, in rhel6 we have neutron-agent-watch to handle this. But with systemd that can't be used. I'm pushing this implementation in oslo: https://review.openstack.org/#/c/97748/ to get systemd reporting back to neutron, but systemd seems to have ERRNO NOTIFY_SOCKET handling and reporting unimplemented (bz#1106457) Version-Release number of selected component (if applicable): openstack-neutron-2014.1-26.el7ost.noarch How reproducible: 100% Steps to Reproduce: 1. Login to neutron network node 2. killall dnsmasq 3. Actual results: Check that neutron-dhcp-agent won't notice it, until the affected networks are changed and the dnsmasq child process is restarted to pickup a new configuration for a tenant network. Expected results: Neutron-*-agent provides an error condition via systemctl status or quits. Additional info:
How to test this: 1) With a working deployment, modify l3_agent.ini and dhcp_agent.ini to include: check_child_processes_action = respawn check_child_processes_interval = 5 2) restart the l3 & dhcp agent. 3) Spawn resources (a VM connected to a private tenant network) 4) tail -f /var/log/neutron/dhcp_agent.log & \ tail -f /var/log/neutron/l3_agent.log & 5) sudo killall dnsmasq you should see then, something like: 2014-10-09 04:31:46.434 9651 ERROR neutron.agent.linux.external_process [-] dnsmasq for dhcp with uuid 67f3c1d9-5861-4466-899f-f166aa97a173 not found. The process should not have died 2014-10-09 04:31:46.434 9651 ERROR neutron.agent.linux.external_process [-] respawning dnsmasq for uuid 67f3c1d9-5861-4466-899f-f166aa97a173 6) sudo killall neutron-ns-metadata-proxy you should see something like: 2014-10-09 04:33:06.564 9656 ERROR neutron.agent.linux.external_process [-] default-service for router with uuid a539a2f8-a6ec-41d1-91b0-bf2ca780b644 not found. The process should not have died 2014-10-09 04:33:06.564 9656 ERROR neutron.agent.linux.external_process [-] respawning None for uuid a539a2f8-a6ec-41d1-91b0-bf2ca780b644 7) modify l3_agent.ini and dhcp_agent.ini to include: check_child_processes_action = exit check_child_processes_interval = 5 8) repeat 4-6, but in this case agent should exit. 9) repeat all above with check_child_processes_interval = 0 , and nothing will happen no service will be restarted automatically, or message will be provided.
In between step 7 and 8, it should say to restart the neutron-l3-agent and neutron-dhcp-agent. Otherwise, I ran through these steps and verified the expected behavior.
(In reply to Sean Toner from comment #7) > In between step 7 and 8, it should say to restart the neutron-l3-agent and > neutron-dhcp-agent. > > Otherwise, I ran through these steps and verified the expected behavior. Correct, I forgot to mention that step. Thank you for testing!.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2014-1786.html