Bug 1330778

Summary:

Deadlock occurs when we turn on syslog and forking a process

Product:

Red Hat OpenStack

Reporter:

kahou <kalei>

Component:

openstack-neutron

Assignee:

Jakub Libosvar <jlibosva>

Status:

CLOSED ERRATA

QA Contact:

Toni Freger <tfreger>

Severity:

urgent

Docs Contact:

Priority:

urgent

Version:

6.0 (Juno)

CC:

aathomas, amuller, astafeye, charcrou, cheburge, chrisw, jdonohue, jlibosva, nyechiel, oblaut, pviktori, skulkarn, srevivo, wlehman

Target Milestone:

async

Keywords:

Triaged, ZStream

Target Release:

8.0 (Liberty)

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

hot

Fixed In Version:

openstack-neutron-7.0.4-8.el7ost

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Clones:

1346759 (view as bug list)

Environment:

Last Closed:

2016-07-20 23:55:21 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1194008

Attachments:

Description	Flags
Neutron server runs into deadlock when syslog config is enabled	none
logging conf which run with deadlock_syslog.py	none
script which reproduces the issue with syslog handler	none

Description kahou 2016-04-26 23:51:47 UTC

Created attachment 1151143 [details]
Neutron server runs into deadlock when syslog config is enabled

Description of problem:

I was trying to turn on the syslog and corresponding configuration (see attached statement) for Neutron server but it doesn't create any consumer for the QPLUGIN queue. This causes all the Neutron agents stop functioning as it cannot communicate with the Neutron server.

Digging it deeper and looks like Neutron server fork the RPC worker when they create the QPLUGIN consumer. But since we use SyslogHandler which create lock in the critical section and this will cause deadlock when the child (Forked) process is trying to log message. Simialar explanation can be found here: http://bugs.python.org/issue6721

Even this issue above is opened against for Python 3, I can reproduce it by running the attached script (lock_fork_thread_deadlock_demo.py) in the issue in RH with Python 2.7.5 .


Version-Release number of selected component (if applicable):


How reproducible:

1. Save the attached logging.conf and in the neutron.conf, set the following under the DEFAULT section:

log_config=/etc/neutron/logging.conf
logging_context_format_string="1 %(asctime)sZ mcp1.paslab013000.mc.metacloud.in neutron-server %(process)d - [MetaCloud@40521 levelname="%(levelname)s" component="neutron-server" funcname="%(name)s" request_id="%(request_id)s" user="%(user)s" tenant="%(tenant)s" instance="%(instance)s" lineno="%(pathname)s:%(lineno)d"] %(name)s %(message)s"
logging_default_format_string="1 %(asctime)sZ mcp1.paslab013000.mc.metacloud.in neutron-server %(process)d - [MetaCloud@40521 levelname="%(levelname)s" component="neutron-server" funcname="%(name)s" instance="%(instance)s" lineno="%(pathname)s:%(lineno)d"] %(name)s %(message)s"
logging_exception_prefix="!!!NL!!! %(process)d TRACE %(name)s %(instance)s"

2. Run neutron-server command like:
/usr/bin/python2 /usr/bin/neutron-server --config-file /usr/share/neutron/neutron-dist.conf --config-dir /usr/share/neutron/server --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugin.ini --config-dir /etc/neutron/conf.d/common --config-dir /etc/neutron/conf.d/neutron-server 

3. Check qplugin queue in rabbitmq:
sudo rabbitmqctl list_queues name messages messages_ready messages_unacknowledged consumers

You will notice that qplugin has 0 consumers

Also, check neutron agent-list, you will see all agents are reports as dead.


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Charles Crouch 2016-04-27 14:35:57 UTC

I'm guessing we only started seeing this now because https://bugzilla.redhat.com/show_bug.cgi?id=1322547 just got fixed.

Comment 2 kahou 2016-04-27 21:13:32 UTC

I have attached a script which uses syslog handler and I can see the deadlock occurs in the child process.

Comment 3 kahou 2016-04-27 21:15:43 UTC

Created attachment 1151626 [details]
logging conf which run with deadlock_syslog.py

Comment 4 kahou 2016-04-27 21:16:36 UTC

Created attachment 1151627 [details]
script which reproduces the issue with syslog handler

Comment 5 Charles Crouch 2016-04-28 14:30:40 UTC

Just to clarify, while this problem was initially observed in Neutron, the subsequent reproduction steps which Kahou has posted don't utilize Neutron at all. Therefore marking this a python related bug.

Comment 10 Assaf Muller 2016-06-08 13:18:43 UTC

An upstream patch that solves the issue in Neutron was merged. We'll begin work on backporting it.

Comment 16 Chet Burgess 2016-06-22 21:26:28 UTC

Can you please provide a link to the upstream neutron commit that addresses this issue? Thank you.

Comment 17 Jakub Libosvar 2016-06-23 07:56:30 UTC

(In reply to Chet Burgess from comment #16)
> Can you please provide a link to the upstream neutron commit that addresses
> this issue? Thank you.

It's https://review.openstack.org/#/c/313277/

Comment 18 Jakub Libosvar 2016-06-23 07:57:39 UTC

https://github.com/openstack/neutron/commit/483c5982c020ff21ceecf1d575c2d8fad2937d6e

Comment 27 Alexander Stafeyev 2016-07-07 07:18:40 UTC

Hi Jakub
what commit is the correct one ? the attached one has gaps 

Tnx

Comment 29 Jakub Libosvar 2016-07-07 08:38:03 UTC

(In reply to Alexander Stafeyev from comment #27)
> Hi Jakub
> what commit is the correct one ? the attached one has gaps 
> 
> Tnx

Can you be more specific? What errors do you see?

Comment 30 Toni Freger 2016-07-07 08:52:15 UTC

The code tested on latest puddle 
openstack-neutron-7.0.4-8.el7ost.noarch

Comment 32 errata-xmlrpc 2016-07-20 23:55:21 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:1473