Created attachment 1151143 [details] Neutron server runs into deadlock when syslog config is enabled Description of problem: I was trying to turn on the syslog and corresponding configuration (see attached statement) for Neutron server but it doesn't create any consumer for the QPLUGIN queue. This causes all the Neutron agents stop functioning as it cannot communicate with the Neutron server. Digging it deeper and looks like Neutron server fork the RPC worker when they create the QPLUGIN consumer. But since we use SyslogHandler which create lock in the critical section and this will cause deadlock when the child (Forked) process is trying to log message. Simialar explanation can be found here: http://bugs.python.org/issue6721 Even this issue above is opened against for Python 3, I can reproduce it by running the attached script (lock_fork_thread_deadlock_demo.py) in the issue in RH with Python 2.7.5 . Version-Release number of selected component (if applicable): How reproducible: 1. Save the attached logging.conf and in the neutron.conf, set the following under the DEFAULT section: log_config=/etc/neutron/logging.conf logging_context_format_string="1 %(asctime)sZ mcp1.paslab013000.mc.metacloud.in neutron-server %(process)d - [MetaCloud@40521 levelname="%(levelname)s" component="neutron-server" funcname="%(name)s" request_id="%(request_id)s" user="%(user)s" tenant="%(tenant)s" instance="%(instance)s" lineno="%(pathname)s:%(lineno)d"] %(name)s %(message)s" logging_default_format_string="1 %(asctime)sZ mcp1.paslab013000.mc.metacloud.in neutron-server %(process)d - [MetaCloud@40521 levelname="%(levelname)s" component="neutron-server" funcname="%(name)s" instance="%(instance)s" lineno="%(pathname)s:%(lineno)d"] %(name)s %(message)s" logging_exception_prefix="!!!NL!!! %(process)d TRACE %(name)s %(instance)s" 2. Run neutron-server command like: /usr/bin/python2 /usr/bin/neutron-server --config-file /usr/share/neutron/neutron-dist.conf --config-dir /usr/share/neutron/server --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugin.ini --config-dir /etc/neutron/conf.d/common --config-dir /etc/neutron/conf.d/neutron-server 3. Check qplugin queue in rabbitmq: sudo rabbitmqctl list_queues name messages messages_ready messages_unacknowledged consumers You will notice that qplugin has 0 consumers Also, check neutron agent-list, you will see all agents are reports as dead. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
I'm guessing we only started seeing this now because https://bugzilla.redhat.com/show_bug.cgi?id=1322547 just got fixed.
I have attached a script which uses syslog handler and I can see the deadlock occurs in the child process.
Created attachment 1151626 [details] logging conf which run with deadlock_syslog.py
Created attachment 1151627 [details] script which reproduces the issue with syslog handler
Just to clarify, while this problem was initially observed in Neutron, the subsequent reproduction steps which Kahou has posted don't utilize Neutron at all. Therefore marking this a python related bug.
An upstream patch that solves the issue in Neutron was merged. We'll begin work on backporting it.
Can you please provide a link to the upstream neutron commit that addresses this issue? Thank you.
(In reply to Chet Burgess from comment #16) > Can you please provide a link to the upstream neutron commit that addresses > this issue? Thank you. It's https://review.openstack.org/#/c/313277/
https://github.com/openstack/neutron/commit/483c5982c020ff21ceecf1d575c2d8fad2937d6e
Hi Jakub what commit is the correct one ? the attached one has gaps Tnx
(In reply to Alexander Stafeyev from comment #27) > Hi Jakub > what commit is the correct one ? the attached one has gaps > > Tnx Can you be more specific? What errors do you see?
The code tested on latest puddle openstack-neutron-7.0.4-8.el7ost.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2016:1473