Bug 1330778 - Deadlock occurs when we turn on syslog and forking a process
Summary: Deadlock occurs when we turn on syslog and forking a process
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron
Version: 6.0 (Juno)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: async
: 8.0 (Liberty)
Assignee: Jakub Libosvar
QA Contact: Toni Freger
URL:
Whiteboard: hot
Depends On:
Blocks: 1194008
TreeView+ depends on / blocked
 
Reported: 2016-04-26 23:51 UTC by kahou
Modified: 2019-11-14 07:52 UTC (History)
14 users (show)

Fixed In Version: openstack-neutron-7.0.4-8.el7ost
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1346759 (view as bug list)
Environment:
Last Closed: 2016-07-20 23:55:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Neutron server runs into deadlock when syslog config is enabled (1.45 KB, text/plain)
2016-04-26 23:51 UTC, kahou
no flags Details
logging conf which run with deadlock_syslog.py (1.44 KB, text/plain)
2016-04-27 21:15 UTC, kahou
no flags Details
script which reproduces the issue with syslog handler (745 bytes, text/plain)
2016-04-27 21:16 UTC, kahou
no flags Details


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 313277 0 None None None 2016-06-08 13:18:04 UTC
Python 6721 0 None None None 2016-05-03 12:46:36 UTC
Red Hat Product Errata RHSA-2016:1473 0 normal SHIPPED_LIVE Low: openstack-neutron security and bug fix update 2016-07-21 03:53:43 UTC

Description kahou 2016-04-26 23:51:47 UTC
Created attachment 1151143 [details]
Neutron server runs into deadlock when syslog config is enabled

Description of problem:

I was trying to turn on the syslog and corresponding configuration (see attached statement) for Neutron server but it doesn't create any consumer for the QPLUGIN queue. This causes all the Neutron agents stop functioning as it cannot communicate with the Neutron server.

Digging it deeper and looks like Neutron server fork the RPC worker when they create the QPLUGIN consumer. But since we use SyslogHandler which create lock in the critical section and this will cause deadlock when the child (Forked) process is trying to log message. Simialar explanation can be found here: http://bugs.python.org/issue6721

Even this issue above is opened against for Python 3, I can reproduce it by running the attached script (lock_fork_thread_deadlock_demo.py) in the issue in RH with Python 2.7.5 .


Version-Release number of selected component (if applicable):


How reproducible:

1. Save the attached logging.conf and in the neutron.conf, set the following under the DEFAULT section:

log_config=/etc/neutron/logging.conf
logging_context_format_string="1 %(asctime)sZ mcp1.paslab013000.mc.metacloud.in neutron-server %(process)d - [MetaCloud@40521 levelname="%(levelname)s" component="neutron-server" funcname="%(name)s" request_id="%(request_id)s" user="%(user)s" tenant="%(tenant)s" instance="%(instance)s" lineno="%(pathname)s:%(lineno)d"] %(name)s %(message)s"
logging_default_format_string="1 %(asctime)sZ mcp1.paslab013000.mc.metacloud.in neutron-server %(process)d - [MetaCloud@40521 levelname="%(levelname)s" component="neutron-server" funcname="%(name)s" instance="%(instance)s" lineno="%(pathname)s:%(lineno)d"] %(name)s %(message)s"
logging_exception_prefix="!!!NL!!! %(process)d TRACE %(name)s %(instance)s"

2. Run neutron-server command like:
/usr/bin/python2 /usr/bin/neutron-server --config-file /usr/share/neutron/neutron-dist.conf --config-dir /usr/share/neutron/server --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugin.ini --config-dir /etc/neutron/conf.d/common --config-dir /etc/neutron/conf.d/neutron-server 

3. Check qplugin queue in rabbitmq:
sudo rabbitmqctl list_queues name messages messages_ready messages_unacknowledged consumers

You will notice that qplugin has 0 consumers

Also, check neutron agent-list, you will see all agents are reports as dead.


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Charles Crouch 2016-04-27 14:35:57 UTC
I'm guessing we only started seeing this now because https://bugzilla.redhat.com/show_bug.cgi?id=1322547 just got fixed.

Comment 2 kahou 2016-04-27 21:13:32 UTC
I have attached a script which uses syslog handler and I can see the deadlock occurs in the child process.

Comment 3 kahou 2016-04-27 21:15:43 UTC
Created attachment 1151626 [details]
logging conf which run with deadlock_syslog.py

Comment 4 kahou 2016-04-27 21:16:36 UTC
Created attachment 1151627 [details]
script which reproduces the issue with syslog handler

Comment 5 Charles Crouch 2016-04-28 14:30:40 UTC
Just to clarify, while this problem was initially observed in Neutron, the subsequent reproduction steps which Kahou has posted don't utilize Neutron at all. Therefore marking this a python related bug.

Comment 10 Assaf Muller 2016-06-08 13:18:43 UTC
An upstream patch that solves the issue in Neutron was merged. We'll begin work on backporting it.

Comment 16 Chet Burgess 2016-06-22 21:26:28 UTC
Can you please provide a link to the upstream neutron commit that addresses this issue? Thank you.

Comment 17 Jakub Libosvar 2016-06-23 07:56:30 UTC
(In reply to Chet Burgess from comment #16)
> Can you please provide a link to the upstream neutron commit that addresses
> this issue? Thank you.

It's https://review.openstack.org/#/c/313277/

Comment 27 Alexander Stafeyev 2016-07-07 07:18:40 UTC
Hi Jakub
what commit is the correct one ? the attached one has gaps 

Tnx

Comment 29 Jakub Libosvar 2016-07-07 08:38:03 UTC
(In reply to Alexander Stafeyev from comment #27)
> Hi Jakub
> what commit is the correct one ? the attached one has gaps 
> 
> Tnx

Can you be more specific? What errors do you see?

Comment 30 Toni Freger 2016-07-07 08:52:15 UTC
The code tested on latest puddle 
openstack-neutron-7.0.4-8.el7ost.noarch

Comment 32 errata-xmlrpc 2016-07-20 23:55:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:1473


Note You need to log in before you can comment on or make changes to this bug.