1339014 – Neutron metadata agent workers are not properly shut down when 'systemctl stop neturon-metadata-agent' is issued

Bug 1339014 - Neutron metadata agent workers are not properly shut down when 'systemctl stop neturon-metadata-agent' is issued

Summary: Neutron metadata agent workers are not properly shut down when 'systemctl sto...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-neutron
Sub Component:
Version:	8.0 (Liberty)
Hardware:	Unspecified
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	async
Target Release:	8.0 (Liberty)
Assignee:	Bernard Cafarelli
QA Contact:	Alexander Stafeyev
Docs Contact:
URL:
Whiteboard:	hot
Depends On:
Blocks:	1194008 1295530
TreeView+	depends on / blocked

Reported:	2016-05-23 23:57 UTC by kahou
Modified:	2019-12-16 05:50 UTC (History)
CC List:	10 users (show)
Fixed In Version:	openstack-neutron-7.1.1-4.el7ost
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-08-24 13:30:03 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
OpenStack gerrit	331672	0	None	None	None	2016-07-26 14:43:06 UTC
Red Hat Product Errata	RHBA-2016:1770	0	normal	SHIPPED_LIVE	openstack-neutron bug fix advisory	2016-08-24 17:29:15 UTC

Description kahou 2016-05-23 23:57:09 UTC

Description of problem:

I have 15 neutron metadata agent processes running in my cluster. When I issue systemctl stop netruon-metadata-agent, systemctl will hang for a while and some of the neutron metadata child processes are not cleaned up properly

Version-Release number of selected component (if applicable):


How reproducible:

Always

Steps to Reproduce:

1. Start neturon metadata agent with 15 metadata_workers. The configuration values are specified in metadata_agent.ini
2. Make sure all 15 metadata processes are running
3. Run systemctl stop neutron-metadata-agent
4. You will notice that systemctl will hang for a while. Once systemctl finish, run ps aux | grep metadata. You will notice some of the metadata process are not cleaned up


Actual results:

systemctl hangs and some of the child metadata processes are not cleaned up

Expected results:

Running systemctl stop/restart neutron-metadata-agent will not hang and all the metadata process should be cleaned up.


Additional info:

Comment 1 kahou 2016-05-24 07:30:01 UTC

If I issue strace -p <main process id>, I see it is looping wait4(0, 0x7fff51d0d6b4, WNOHANG, NULL) = 0

Comment 2 kahou 2016-05-24 17:11:10 UTC

Please note that the service was originally managed by pacemaker which uses systemd to start/stop/restart the service. I was trying to make the debugging simplier so that I just use systemctl to reproduce the issue.

Comment 24 Joe Donohue 2016-07-19 16:54:28 UTC

Any ETA on this one?

Comment 25 Bernard Cafarelli 2016-07-20 09:55:59 UTC

Waiting for customer feedback on whether the upstream fix https://review.openstack.org/#/c/331672/ (provided in a test package) solves this bug in their environment.

In the meantime, this workaround proved to work:
* change the KillMode value to "control-group" in /usr/lib/systemd/system/neutron-metadata-agent.service
* run: "systemctl daemon-reload"

Comment 26 Charles Crouch 2016-07-20 14:27:30 UTC

(In reply to Bernard Cafarelli from comment #25)
> Waiting for customer feedback on whether the upstream fix
> https://review.openstack.org/#/c/331672/ (provided in a test package) solves
> this bug in their environment.

Hi Bernard
This is the corresponding support case right: https://access.redhat.com/support/cases/#/case/01640942 ?
I'm not seeing a test package attached there? Could you add it to the case.

Thanks
Charles

Comment 29 Charles Crouch 2016-07-26 16:54:14 UTC

From support case from Kahou: "We have verified that the patch works. By applying the change, we don't see any more zombie child process anymore even we are using "process" as kill-mode."

So (thumbsup) from us :-)

Comment 30 Bernard Cafarelli 2016-07-27 12:02:37 UTC

Thanks for the test! That confirms https://review.openstack.org/#/c/331672/ fixes this bug, we will review and integrate the change

Comment 34 errata-xmlrpc 2016-08-24 13:30:03 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1770.html

Note You need to log in before you can comment on or make changes to this bug.