Bug 1106489 - neutron-*-agent child processes can die unnoticed
Summary: neutron-*-agent child processes can die unnoticed
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron
Version: 5.0 (RHEL 7)
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: z2
: 5.0 (RHEL 7)
Assignee: Miguel Angel Ajo
QA Contact: Ofer Blaut
URL:
Whiteboard:
Depends On: 1065172 1106457
Blocks: 1083890
TreeView+ depends on / blocked
 
Reported: 2014-06-09 12:54 UTC by Miguel Angel Ajo
Modified: 2022-07-09 07:17 UTC (History)
7 users (show)

Fixed In Version: openstack-neutron-2014.1.3-4.el7ost
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-11-03 08:38:17 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 97748 0 'None' ABANDONED Add service-status-interface spec 2020-07-13 05:27:47 UTC
OpenStack gerrit 105999 0 'None' MERGED Add agent-child-processes-status blueprint 2020-07-13 05:27:47 UTC
Red Hat Issue Tracker OSP-16564 0 None None None 2022-07-09 07:17:10 UTC
Red Hat Product Errata RHSA-2014:1786 0 normal SHIPPED_LIVE Moderate: openstack-neutron security, bug fix, and enhancement update 2014-11-03 13:36:33 UTC

Description Miguel Angel Ajo 2014-06-09 12:54:40 UTC
Description of problem:

  If neutron-*-agent child processes die, the agent's won't notice it, in rhel6 we have neutron-agent-watch to handle this. But with systemd that can't be used.

  I'm pushing this implementation in oslo: https://review.openstack.org/#/c/97748/ to get systemd reporting back to neutron, but systemd seems to have ERRNO NOTIFY_SOCKET handling and reporting unimplemented (bz#1106457)

Version-Release number of selected component (if applicable):
openstack-neutron-2014.1-26.el7ost.noarch

How reproducible:

100%

Steps to Reproduce:
1. Login to neutron network node
2. killall dnsmasq
3. 

Actual results:

Check that neutron-dhcp-agent won't notice it, until the affected networks are changed and the dnsmasq child process is restarted to pickup a new configuration for a tenant network.

Expected results:

Neutron-*-agent provides an error condition via systemctl status or quits.

Additional info:

Comment 4 Miguel Angel Ajo 2014-10-09 09:14:25 UTC
How to test this:

1) With a working deployment, modify l3_agent.ini and dhcp_agent.ini to include:

check_child_processes_action = respawn
check_child_processes_interval = 5

2) restart the l3 & dhcp agent.

3) Spawn resources (a VM connected to a private tenant network)

4) tail -f /var/log/neutron/dhcp_agent.log & \
   tail -f /var/log/neutron/l3_agent.log &

5) sudo killall dnsmasq

you should see then, something like:

2014-10-09 04:31:46.434 9651 ERROR neutron.agent.linux.external_process [-] dnsmasq for dhcp with uuid 67f3c1d9-5861-4466-899f-f166aa97a173 not found. The process should not have died
2014-10-09 04:31:46.434 9651 ERROR neutron.agent.linux.external_process [-] respawning dnsmasq for uuid 67f3c1d9-5861-4466-899f-f166aa97a173


6) sudo killall neutron-ns-metadata-proxy

you should see something like:

2014-10-09 04:33:06.564 9656 ERROR neutron.agent.linux.external_process [-] default-service for router with uuid a539a2f8-a6ec-41d1-91b0-bf2ca780b644 not found. The process should not have died
2014-10-09 04:33:06.564 9656 ERROR neutron.agent.linux.external_process [-] respawning None for uuid a539a2f8-a6ec-41d1-91b0-bf2ca780b644


7) modify l3_agent.ini and dhcp_agent.ini to include:

check_child_processes_action = exit
check_child_processes_interval = 5

8) repeat 4-6, but in this case agent should exit.

9) repeat all above with check_child_processes_interval = 0 , and nothing will happen no service will be restarted automatically, or message will be provided.

Comment 7 Sean Toner 2014-10-09 15:52:25 UTC
In between step 7 and 8, it should say to restart the neutron-l3-agent and neutron-dhcp-agent.

Otherwise, I ran through these steps and verified the expected behavior.

Comment 8 Miguel Angel Ajo 2014-10-10 08:26:29 UTC
(In reply to Sean Toner from comment #7)
> In between step 7 and 8, it should say to restart the neutron-l3-agent and
> neutron-dhcp-agent.
> 
> Otherwise, I ran through these steps and verified the expected behavior.

Correct, I forgot to mention that step.

Thank you for testing!.

Comment 10 errata-xmlrpc 2014-11-03 08:38:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2014-1786.html


Note You need to log in before you can comment on or make changes to this bug.