Bug 1288528 - services can´t stop due to broken oslo service
Summary: services can´t stop due to broken oslo service
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-oslo-service
Version: 8.0 (Liberty)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ga
: 8.0 (Liberty)
Assignee: Victor Stinner
QA Contact: Leonid Natapov
URL:
Whiteboard:
: 1290599 (view as bug list)
Depends On:
Blocks: 1261979
TreeView+ depends on / blocked
 
Reported: 2015-12-04 13:59 UTC by Fabio Massimo Di Nitto
Modified: 2020-05-14 15:04 UTC (History)
12 users (show)

Fixed In Version: python-oslo-service-0.9.0-2.6.el7ost
Doc Type: Bug Fix
Doc Text:
A race condition in the SIGTERM and SIGINT signal handlers made it possible for worker processes to ignore incoming SIGTERM signals. When two SIGTERM signals were received "quickly" in child processes of OpenStack services, some worker processes could fail to handle incoming SIGTERM signals; as a result, those processes would remain active. Whenever this occurred, the following AssertionError exception message appeared in logs: Cannot switch to MAINLOOP from MAINLOOP This release includes an oslo.service that fixes the race condition, thereby ensuring that SIGTERM signals are handled correctly.
Clone Of:
Environment:
Last Closed: 2016-04-07 21:15:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1524907 0 None None None Never
OpenStack gerrit 256267 0 'None' MERGED Fix a race condition in signal handlers 2020-08-18 12:28:40 UTC
Red Hat Product Errata RHEA-2016:0603 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 8 Enhancement Advisory 2016-04-08 00:53:53 UTC

Description Fabio Massimo Di Nitto 2015-12-04 13:59:00 UTC
filing against this oslo package because python-oslo-service is currently missing in bugzilla.

python-oslo-service-0.9.0-2.5.el7ost.noarch

Stopping services:

systemctl stop .....

often cases tracebacks and services fails to stop. For some reasons it´s easier to trigger with nova-conductor but we have seen this problem across different services:

2015-12-04 08:47:59.128 19415 ERROR oslo_service.service [req-4c7d45e7-8220-42cd-9ecd-903566198453 - - - - -] Unhandled exception
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service Traceback (most recent call last):
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 377, in _child_wait_for_exit_or_signal
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     launcher.wait()
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 204, in wait
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     self.services.wait()
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 625, in wait
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     service.wait()
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 591, in wait
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     self._done.wait()
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/eventlet/event.py", line 121, in wait
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     return hubs.get_hub().switch()
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 294, in switch
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     return self.greenlet.switch()
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 346, in run
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     self.wait(sleep_time)
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/eventlet/hubs/poll.py", line 85, in wait
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     presult = self.do_poll(seconds)
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/eventlet/hubs/epolls.py", line 62, in do_poll
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     return self.poll.poll(seconds)
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 160, in _handle_signals
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     handler(signo, frame)
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 355, in _sigterm
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     SignalHandler().clear()
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 116, in __call__
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     with lockutils.lock('singleton_lock', semaphores=cls._semaphores):
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     return self.gen.next()
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 195, in lock
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     int_lock = internal_lock(name, semaphores=semaphores)
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 160, in internal_lock
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     return semaphores.get(name)
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 109, in get
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     sem = threading.Semaphore()
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib64/python2.7/threading.py", line 423, in Semaphore
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     return _Semaphore(*args, **kwargs)
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib64/python2.7/threading.py", line 439, in __init__
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     self.__cond = Condition(Lock())
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib64/python2.7/threading.py", line 252, in Condition
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     return _Condition(*args, **kwargs)
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib64/python2.7/threading.py", line 260, in __init__
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     _Verbose.__init__(self, verbose)
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 160, in _handle_signals
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     handler(signo, frame)
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 355, in _sigterm
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     SignalHandler().clear()
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 116, in __call__
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     with lockutils.lock('singleton_lock', semaphores=cls._semaphores):
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     return self.gen.next()
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 195, in lock
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     int_lock = internal_lock(name, semaphores=semaphores)
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 160, in internal_lock
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     return semaphores.get(name)
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 105, in get
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     with self._lock:
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/eventlet/semaphore.py", line 127, in __enter__
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     self.acquire()
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/eventlet/semaphore.py", line 113, in acquire
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     hubs.get_hub().switch()
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 280, in switch
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     assert cur is not self.greenlet, 'Cannot switch to MAINLOOP from MAINLOOP'
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service AssertionError: Cannot switch to MAINLOOP from MAINLOOP
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service

Comment 2 Jon Schlueter 2015-12-04 14:33:17 UTC
might want to also file an upstream bug.

Comment 4 Victor Stinner 2015-12-10 17:04:02 UTC
The bug is a race condition in oslo.service. oslo.service is not a daemon, but a library used by various OpenStack services like keystone or nova. The race condition occurs when two SIGTERM are received shortly.

It looks like keystone (and other OpenStack services) are configured in systemd to send SIGTERM to all processes of the cgroups, not only to the main process. The problem is that oslo.service sends again a second SIGTERM to all child processes. The services should be configured with KillMode=process to only send SIGTERM to the main process.

I also have a fix for oslo.service to fix the root cause of the race condition, to handle correctly two SIGTERM signals sent shortly.

Comment 10 Leonid Natapov 2016-01-26 13:46:49 UTC
python-oslo-service-0.9.0-2.6.el7ost.noarch

The problem doesn't reproduce for me either.

Comment 12 errata-xmlrpc 2016-04-07 21:15:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-0603.html

Comment 13 Ken Gaillot 2016-05-16 18:14:21 UTC
*** Bug 1290599 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.