Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1106489 - neutron-*-agent child processes can die unnoticed
neutron-*-agent child processes can die unnoticed
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron (Show other bugs)
5.0 (RHEL 7)
Unspecified Unspecified
high Severity medium
: z2
: 5.0 (RHEL 7)
Assigned To: Miguel Angel Ajo
Ofer Blaut
: Regression, ZStream
Depends On: 1065172 1106457
Blocks: 1083890
  Show dependency treegraph
 
Reported: 2014-06-09 08:54 EDT by Miguel Angel Ajo
Modified: 2016-04-26 14:19 EDT (History)
7 users (show)

See Also:
Fixed In Version: openstack-neutron-2014.1.3-4.el7ost
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-11-03 03:38:17 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
OpenStack gerrit 105999 None None None Never
OpenStack gerrit 97748 None None None Never
Red Hat Product Errata RHSA-2014:1786 normal SHIPPED_LIVE Moderate: openstack-neutron security, bug fix, and enhancement update 2014-11-03 08:36:33 EST

  None (edit)
Description Miguel Angel Ajo 2014-06-09 08:54:40 EDT
Description of problem:

  If neutron-*-agent child processes die, the agent's won't notice it, in rhel6 we have neutron-agent-watch to handle this. But with systemd that can't be used.

  I'm pushing this implementation in oslo: https://review.openstack.org/#/c/97748/ to get systemd reporting back to neutron, but systemd seems to have ERRNO NOTIFY_SOCKET handling and reporting unimplemented (bz#1106457)

Version-Release number of selected component (if applicable):
openstack-neutron-2014.1-26.el7ost.noarch

How reproducible:

100%

Steps to Reproduce:
1. Login to neutron network node
2. killall dnsmasq
3. 

Actual results:

Check that neutron-dhcp-agent won't notice it, until the affected networks are changed and the dnsmasq child process is restarted to pickup a new configuration for a tenant network.

Expected results:

Neutron-*-agent provides an error condition via systemctl status or quits.

Additional info:
Comment 4 Miguel Angel Ajo 2014-10-09 05:14:25 EDT
How to test this:

1) With a working deployment, modify l3_agent.ini and dhcp_agent.ini to include:

check_child_processes_action = respawn
check_child_processes_interval = 5

2) restart the l3 & dhcp agent.

3) Spawn resources (a VM connected to a private tenant network)

4) tail -f /var/log/neutron/dhcp_agent.log & \
   tail -f /var/log/neutron/l3_agent.log &

5) sudo killall dnsmasq

you should see then, something like:

2014-10-09 04:31:46.434 9651 ERROR neutron.agent.linux.external_process [-] dnsmasq for dhcp with uuid 67f3c1d9-5861-4466-899f-f166aa97a173 not found. The process should not have died
2014-10-09 04:31:46.434 9651 ERROR neutron.agent.linux.external_process [-] respawning dnsmasq for uuid 67f3c1d9-5861-4466-899f-f166aa97a173


6) sudo killall neutron-ns-metadata-proxy

you should see something like:

2014-10-09 04:33:06.564 9656 ERROR neutron.agent.linux.external_process [-] default-service for router with uuid a539a2f8-a6ec-41d1-91b0-bf2ca780b644 not found. The process should not have died
2014-10-09 04:33:06.564 9656 ERROR neutron.agent.linux.external_process [-] respawning None for uuid a539a2f8-a6ec-41d1-91b0-bf2ca780b644


7) modify l3_agent.ini and dhcp_agent.ini to include:

check_child_processes_action = exit
check_child_processes_interval = 5

8) repeat 4-6, but in this case agent should exit.

9) repeat all above with check_child_processes_interval = 0 , and nothing will happen no service will be restarted automatically, or message will be provided.
Comment 7 Sean Toner 2014-10-09 11:52:25 EDT
In between step 7 and 8, it should say to restart the neutron-l3-agent and neutron-dhcp-agent.

Otherwise, I ran through these steps and verified the expected behavior.
Comment 8 Miguel Angel Ajo 2014-10-10 04:26:29 EDT
(In reply to Sean Toner from comment #7)
> In between step 7 and 8, it should say to restart the neutron-l3-agent and
> neutron-dhcp-agent.
> 
> Otherwise, I ran through these steps and verified the expected behavior.

Correct, I forgot to mention that step.

Thank you for testing!.
Comment 10 errata-xmlrpc 2014-11-03 03:38:17 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2014-1786.html

Note You need to log in before you can comment on or make changes to this bug.