Bug 1339014 - Neutron metadata agent workers are not properly shut down when 'systemctl stop neturon-metadata-agent' is issued
Summary: Neutron metadata agent workers are not properly shut down when 'systemctl sto...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron
Version: 8.0 (Liberty)
Hardware: Unspecified
OS: Linux
high
high
Target Milestone: async
: 8.0 (Liberty)
Assignee: Bernard Cafarelli
QA Contact: Alexander Stafeyev
URL:
Whiteboard: hot
Depends On:
Blocks: 1194008 1295530
TreeView+ depends on / blocked
 
Reported: 2016-05-23 23:57 UTC by kahou
Modified: 2019-12-16 05:50 UTC (History)
10 users (show)

Fixed In Version: openstack-neutron-7.1.1-4.el7ost
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-08-24 13:30:03 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 331672 0 None None None 2016-07-26 14:43:06 UTC
Red Hat Product Errata RHBA-2016:1770 0 normal SHIPPED_LIVE openstack-neutron bug fix advisory 2016-08-24 17:29:15 UTC

Description kahou 2016-05-23 23:57:09 UTC
Description of problem:

I have 15 neutron metadata agent processes running in my cluster. When I issue systemctl stop netruon-metadata-agent, systemctl will hang for a while and some of the neutron metadata child processes are not cleaned up properly

Version-Release number of selected component (if applicable):


How reproducible:

Always

Steps to Reproduce:

1. Start neturon metadata agent with 15 metadata_workers. The configuration values are specified in metadata_agent.ini
2. Make sure all 15 metadata processes are running
3. Run systemctl stop neutron-metadata-agent
4. You will notice that systemctl will hang for a while. Once systemctl finish, run ps aux | grep metadata. You will notice some of the metadata process are not cleaned up


Actual results:

systemctl hangs and some of the child metadata processes are not cleaned up

Expected results:

Running systemctl stop/restart neutron-metadata-agent will not hang and all the metadata process should be cleaned up.


Additional info:

Comment 1 kahou 2016-05-24 07:30:01 UTC
If I issue strace -p <main process id>, I see it is looping wait4(0, 0x7fff51d0d6b4, WNOHANG, NULL) = 0

Comment 2 kahou 2016-05-24 17:11:10 UTC
Please note that the service was originally managed by pacemaker which uses systemd to start/stop/restart the service. I was trying to make the debugging simplier so that I just use systemctl to reproduce the issue.

Comment 24 Joe Donohue 2016-07-19 16:54:28 UTC
Any ETA on this one?

Comment 25 Bernard Cafarelli 2016-07-20 09:55:59 UTC
Waiting for customer feedback on whether the upstream fix https://review.openstack.org/#/c/331672/ (provided in a test package) solves this bug in their environment.

In the meantime, this workaround proved to work:
* change the KillMode value to "control-group" in /usr/lib/systemd/system/neutron-metadata-agent.service
* run: "systemctl daemon-reload"

Comment 26 Charles Crouch 2016-07-20 14:27:30 UTC
(In reply to Bernard Cafarelli from comment #25)
> Waiting for customer feedback on whether the upstream fix
> https://review.openstack.org/#/c/331672/ (provided in a test package) solves
> this bug in their environment.

Hi Bernard
This is the corresponding support case right: https://access.redhat.com/support/cases/#/case/01640942 ?
I'm not seeing a test package attached there? Could you add it to the case.

Thanks
Charles

Comment 29 Charles Crouch 2016-07-26 16:54:14 UTC
From support case from Kahou: "We have verified that the patch works. By applying the change, we don't see any more zombie child process anymore even we are using "process" as kill-mode."

So (thumbsup) from us :-)

Comment 30 Bernard Cafarelli 2016-07-27 12:02:37 UTC
Thanks for the test! That confirms https://review.openstack.org/#/c/331672/ fixes this bug, we will review and integrate the change

Comment 34 errata-xmlrpc 2016-08-24 13:30:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1770.html


Note You need to log in before you can comment on or make changes to this bug.