A race condition in the SIGTERM and SIGINT signal handlers made it possible for worker processes to ignore incoming SIGTERM signals. When two SIGTERM signals were received "quickly" in child processes of OpenStack services, some worker processes could fail to handle incoming SIGTERM signals; as a result, those processes would remain active. Whenever this occurred, the following AssertionError exception message appeared in logs:
Cannot switch to MAINLOOP from MAINLOOP
This release includes an oslo.service that fixes the race condition, thereby ensuring that SIGTERM signals are handled correctly.
The bug is a race condition in oslo.service. oslo.service is not a daemon, but a library used by various OpenStack services like keystone or nova. The race condition occurs when two SIGTERM are received shortly.
It looks like keystone (and other OpenStack services) are configured in systemd to send SIGTERM to all processes of the cgroups, not only to the main process. The problem is that oslo.service sends again a second SIGTERM to all child processes. The services should be configured with KillMode=process to only send SIGTERM to the main process.
I also have a fix for oslo.service to fix the root cause of the race condition, to handle correctly two SIGTERM signals sent shortly.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://rhn.redhat.com/errata/RHEA-2016-0603.html
filing against this oslo package because python-oslo-service is currently missing in bugzilla. python-oslo-service-0.9.0-2.5.el7ost.noarch Stopping services: systemctl stop ..... often cases tracebacks and services fails to stop. For some reasons it´s easier to trigger with nova-conductor but we have seen this problem across different services: 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service [req-4c7d45e7-8220-42cd-9ecd-903566198453 - - - - -] Unhandled exception 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service Traceback (most recent call last): 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 377, in _child_wait_for_exit_or_signal 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service launcher.wait() 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 204, in wait 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service self.services.wait() 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 625, in wait 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service service.wait() 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 591, in wait 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service self._done.wait() 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/eventlet/event.py", line 121, in wait 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service return hubs.get_hub().switch() 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 294, in switch 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service return self.greenlet.switch() 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 346, in run 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service self.wait(sleep_time) 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/eventlet/hubs/poll.py", line 85, in wait 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service presult = self.do_poll(seconds) 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/eventlet/hubs/epolls.py", line 62, in do_poll 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service return self.poll.poll(seconds) 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 160, in _handle_signals 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service handler(signo, frame) 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 355, in _sigterm 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service SignalHandler().clear() 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 116, in __call__ 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service with lockutils.lock('singleton_lock', semaphores=cls._semaphores): 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__ 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service return self.gen.next() 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 195, in lock 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service int_lock = internal_lock(name, semaphores=semaphores) 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 160, in internal_lock 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service return semaphores.get(name) 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 109, in get 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service sem = threading.Semaphore() 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service File "/usr/lib64/python2.7/threading.py", line 423, in Semaphore 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service return _Semaphore(*args, **kwargs) 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service File "/usr/lib64/python2.7/threading.py", line 439, in __init__ 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service self.__cond = Condition(Lock()) 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service File "/usr/lib64/python2.7/threading.py", line 252, in Condition 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service return _Condition(*args, **kwargs) 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service File "/usr/lib64/python2.7/threading.py", line 260, in __init__ 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service _Verbose.__init__(self, verbose) 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 160, in _handle_signals 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service handler(signo, frame) 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 355, in _sigterm 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service SignalHandler().clear() 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 116, in __call__ 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service with lockutils.lock('singleton_lock', semaphores=cls._semaphores): 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__ 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service return self.gen.next() 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 195, in lock 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service int_lock = internal_lock(name, semaphores=semaphores) 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 160, in internal_lock 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service return semaphores.get(name) 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 105, in get 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service with self._lock: 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/eventlet/semaphore.py", line 127, in __enter__ 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service self.acquire() 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/eventlet/semaphore.py", line 113, in acquire 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service hubs.get_hub().switch() 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 280, in switch 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service assert cur is not self.greenlet, 'Cannot switch to MAINLOOP from MAINLOOP' 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service AssertionError: Cannot switch to MAINLOOP from MAINLOOP 2015-12-04 08:47:59.128 19415 ERROR oslo_service.service