Bug 1339014

Summary:	Neutron metadata agent workers are not properly shut down when 'systemctl stop neturon-metadata-agent' is issued
Product:	Red Hat OpenStack	Reporter:	kahou <kalei>
Component:	openstack-neutron	Assignee:	Bernard Cafarelli <bcafarel>
Status:	CLOSED ERRATA	QA Contact:	Alexander Stafeyev <astafeye>
Severity:	high	Docs Contact:
Priority:	high
Version:	8.0 (Liberty)	CC:	aathomas, amuller, bcafarel, charcrou, chrisw, jdonohue, nyechiel, skulkarn, srevivo, tfreger
Target Milestone:	async	Keywords:	ZStream
Target Release:	8.0 (Liberty)
Hardware:	Unspecified
OS:	Linux
Whiteboard:	hot
Fixed In Version:	openstack-neutron-7.1.1-4.el7ost	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-08-24 13:30:03 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1194008, 1295530

Description kahou 2016-05-23 23:57:09 UTC

Description of problem:

I have 15 neutron metadata agent processes running in my cluster. When I issue systemctl stop netruon-metadata-agent, systemctl will hang for a while and some of the neutron metadata child processes are not cleaned up properly

Version-Release number of selected component (if applicable):


How reproducible:

Always

Steps to Reproduce:

1. Start neturon metadata agent with 15 metadata_workers. The configuration values are specified in metadata_agent.ini
2. Make sure all 15 metadata processes are running
3. Run systemctl stop neutron-metadata-agent
4. You will notice that systemctl will hang for a while. Once systemctl finish, run ps aux | grep metadata. You will notice some of the metadata process are not cleaned up


Actual results:

systemctl hangs and some of the child metadata processes are not cleaned up

Expected results:

Running systemctl stop/restart neutron-metadata-agent will not hang and all the metadata process should be cleaned up.


Additional info:

Comment 1 kahou 2016-05-24 07:30:01 UTC

If I issue strace -p <main process id>, I see it is looping wait4(0, 0x7fff51d0d6b4, WNOHANG, NULL) = 0

Comment 2 kahou 2016-05-24 17:11:10 UTC

Please note that the service was originally managed by pacemaker which uses systemd to start/stop/restart the service. I was trying to make the debugging simplier so that I just use systemctl to reproduce the issue.

Comment 24 Joe Donohue 2016-07-19 16:54:28 UTC

Any ETA on this one?

Comment 25 Bernard Cafarelli 2016-07-20 09:55:59 UTC

Waiting for customer feedback on whether the upstream fix https://review.openstack.org/#/c/331672/ (provided in a test package) solves this bug in their environment.

In the meantime, this workaround proved to work:
* change the KillMode value to "control-group" in /usr/lib/systemd/system/neutron-metadata-agent.service
* run: "systemctl daemon-reload"

Comment 26 Charles Crouch 2016-07-20 14:27:30 UTC

(In reply to Bernard Cafarelli from comment #25)
> Waiting for customer feedback on whether the upstream fix
> https://review.openstack.org/#/c/331672/ (provided in a test package) solves
> this bug in their environment.

Hi Bernard
This is the corresponding support case right: https://access.redhat.com/support/cases/#/case/01640942 ?
I'm not seeing a test package attached there? Could you add it to the case.

Thanks
Charles

Comment 29 Charles Crouch 2016-07-26 16:54:14 UTC

From support case from Kahou: "We have verified that the patch works. By applying the change, we don't see any more zombie child process anymore even we are using "process" as kill-mode."

So (thumbsup) from us :-)

Comment 30 Bernard Cafarelli 2016-07-27 12:02:37 UTC

Thanks for the test! That confirms https://review.openstack.org/#/c/331672/ fixes this bug, we will review and integrate the change

Comment 34 errata-xmlrpc 2016-08-24 13:30:03 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1770.html