Bug 1330778

Summary: Deadlock occurs when we turn on syslog and forking a process
Product: Red Hat OpenStack Reporter: kahou <kalei>
Component: openstack-neutronAssignee: Jakub Libosvar <jlibosva>
Status: CLOSED ERRATA QA Contact: Toni Freger <tfreger>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 6.0 (Juno)CC: aathomas, amuller, astafeye, charcrou, cheburge, chrisw, jdonohue, jlibosva, nyechiel, oblaut, pviktori, skulkarn, srevivo, wlehman
Target Milestone: asyncKeywords: Triaged, ZStream
Target Release: 8.0 (Liberty)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: hot
Fixed In Version: openstack-neutron-7.0.4-8.el7ost Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1346759 (view as bug list) Environment:
Last Closed: 2016-07-20 23:55:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1194008    
Attachments:
Description Flags
Neutron server runs into deadlock when syslog config is enabled
none
logging conf which run with deadlock_syslog.py
none
script which reproduces the issue with syslog handler none

Description kahou 2016-04-26 23:51:47 UTC
Created attachment 1151143 [details]
Neutron server runs into deadlock when syslog config is enabled

Description of problem:

I was trying to turn on the syslog and corresponding configuration (see attached statement) for Neutron server but it doesn't create any consumer for the QPLUGIN queue. This causes all the Neutron agents stop functioning as it cannot communicate with the Neutron server.

Digging it deeper and looks like Neutron server fork the RPC worker when they create the QPLUGIN consumer. But since we use SyslogHandler which create lock in the critical section and this will cause deadlock when the child (Forked) process is trying to log message. Simialar explanation can be found here: http://bugs.python.org/issue6721

Even this issue above is opened against for Python 3, I can reproduce it by running the attached script (lock_fork_thread_deadlock_demo.py) in the issue in RH with Python 2.7.5 .


Version-Release number of selected component (if applicable):


How reproducible:

1. Save the attached logging.conf and in the neutron.conf, set the following under the DEFAULT section:

log_config=/etc/neutron/logging.conf
logging_context_format_string="1 %(asctime)sZ mcp1.paslab013000.mc.metacloud.in neutron-server %(process)d - [MetaCloud@40521 levelname="%(levelname)s" component="neutron-server" funcname="%(name)s" request_id="%(request_id)s" user="%(user)s" tenant="%(tenant)s" instance="%(instance)s" lineno="%(pathname)s:%(lineno)d"] %(name)s %(message)s"
logging_default_format_string="1 %(asctime)sZ mcp1.paslab013000.mc.metacloud.in neutron-server %(process)d - [MetaCloud@40521 levelname="%(levelname)s" component="neutron-server" funcname="%(name)s" instance="%(instance)s" lineno="%(pathname)s:%(lineno)d"] %(name)s %(message)s"
logging_exception_prefix="!!!NL!!! %(process)d TRACE %(name)s %(instance)s"

2. Run neutron-server command like:
/usr/bin/python2 /usr/bin/neutron-server --config-file /usr/share/neutron/neutron-dist.conf --config-dir /usr/share/neutron/server --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugin.ini --config-dir /etc/neutron/conf.d/common --config-dir /etc/neutron/conf.d/neutron-server 

3. Check qplugin queue in rabbitmq:
sudo rabbitmqctl list_queues name messages messages_ready messages_unacknowledged consumers

You will notice that qplugin has 0 consumers

Also, check neutron agent-list, you will see all agents are reports as dead.


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Charles Crouch 2016-04-27 14:35:57 UTC
I'm guessing we only started seeing this now because https://bugzilla.redhat.com/show_bug.cgi?id=1322547 just got fixed.

Comment 2 kahou 2016-04-27 21:13:32 UTC
I have attached a script which uses syslog handler and I can see the deadlock occurs in the child process.

Comment 3 kahou 2016-04-27 21:15:43 UTC
Created attachment 1151626 [details]
logging conf which run with deadlock_syslog.py

Comment 4 kahou 2016-04-27 21:16:36 UTC
Created attachment 1151627 [details]
script which reproduces the issue with syslog handler

Comment 5 Charles Crouch 2016-04-28 14:30:40 UTC
Just to clarify, while this problem was initially observed in Neutron, the subsequent reproduction steps which Kahou has posted don't utilize Neutron at all. Therefore marking this a python related bug.

Comment 10 Assaf Muller 2016-06-08 13:18:43 UTC
An upstream patch that solves the issue in Neutron was merged. We'll begin work on backporting it.

Comment 16 Chet Burgess 2016-06-22 21:26:28 UTC
Can you please provide a link to the upstream neutron commit that addresses this issue? Thank you.

Comment 17 Jakub Libosvar 2016-06-23 07:56:30 UTC
(In reply to Chet Burgess from comment #16)
> Can you please provide a link to the upstream neutron commit that addresses
> this issue? Thank you.

It's https://review.openstack.org/#/c/313277/

Comment 27 Alexander Stafeyev 2016-07-07 07:18:40 UTC
Hi Jakub
what commit is the correct one ? the attached one has gaps 

Tnx

Comment 29 Jakub Libosvar 2016-07-07 08:38:03 UTC
(In reply to Alexander Stafeyev from comment #27)
> Hi Jakub
> what commit is the correct one ? the attached one has gaps 
> 
> Tnx

Can you be more specific? What errors do you see?

Comment 30 Toni Freger 2016-07-07 08:52:15 UTC
The code tested on latest puddle 
openstack-neutron-7.0.4-8.el7ost.noarch

Comment 32 errata-xmlrpc 2016-07-20 23:55:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:1473