Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1288528 - services can´t stop due to broken oslo service
services can´t stop due to broken oslo service
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-oslo-service (Show other bugs)
8.0 (Liberty)
Unspecified Unspecified
urgent Severity urgent
: ga
: 8.0 (Liberty)
Assigned To: Victor Stinner
Leonid Natapov
:
: 1290599 (view as bug list)
Depends On:
Blocks: 1261979
  Show dependency treegraph
 
Reported: 2015-12-04 08:59 EST by Fabio Massimo Di Nitto
Modified: 2016-05-16 14:14 EDT (History)
12 users (show)

See Also:
Fixed In Version: python-oslo-service-0.9.0-2.6.el7ost
Doc Type: Bug Fix
Doc Text:
A race condition in the SIGTERM and SIGINT signal handlers made it possible for worker processes to ignore incoming SIGTERM signals. When two SIGTERM signals were received "quickly" in child processes of OpenStack services, some worker processes could fail to handle incoming SIGTERM signals; as a result, those processes would remain active. Whenever this occurred, the following AssertionError exception message appeared in logs: Cannot switch to MAINLOOP from MAINLOOP This release includes an oslo.service that fixes the race condition, thereby ensuring that SIGTERM signals are handled correctly.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-04-07 17:15:58 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Launchpad 1524907 None None None Never
OpenStack gerrit 256267 None None None Never
Red Hat Product Errata RHEA-2016:0603 normal SHIPPED_LIVE Red Hat OpenStack Platform 8 Enhancement Advisory 2016-04-07 20:53:53 EDT

  None (edit)
Description Fabio Massimo Di Nitto 2015-12-04 08:59:00 EST
filing against this oslo package because python-oslo-service is currently missing in bugzilla.

python-oslo-service-0.9.0-2.5.el7ost.noarch

Stopping services:

systemctl stop .....

often cases tracebacks and services fails to stop. For some reasons it´s easier to trigger with nova-conductor but we have seen this problem across different services:

2015-12-04 08:47:59.128 19415 ERROR oslo_service.service [req-4c7d45e7-8220-42cd-9ecd-903566198453 - - - - -] Unhandled exception
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service Traceback (most recent call last):
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 377, in _child_wait_for_exit_or_signal
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     launcher.wait()
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 204, in wait
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     self.services.wait()
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 625, in wait
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     service.wait()
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 591, in wait
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     self._done.wait()
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/eventlet/event.py", line 121, in wait
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     return hubs.get_hub().switch()
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 294, in switch
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     return self.greenlet.switch()
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 346, in run
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     self.wait(sleep_time)
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/eventlet/hubs/poll.py", line 85, in wait
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     presult = self.do_poll(seconds)
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/eventlet/hubs/epolls.py", line 62, in do_poll
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     return self.poll.poll(seconds)
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 160, in _handle_signals
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     handler(signo, frame)
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 355, in _sigterm
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     SignalHandler().clear()
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 116, in __call__
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     with lockutils.lock('singleton_lock', semaphores=cls._semaphores):
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     return self.gen.next()
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 195, in lock
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     int_lock = internal_lock(name, semaphores=semaphores)
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 160, in internal_lock
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     return semaphores.get(name)
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 109, in get
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     sem = threading.Semaphore()
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib64/python2.7/threading.py", line 423, in Semaphore
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     return _Semaphore(*args, **kwargs)
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib64/python2.7/threading.py", line 439, in __init__
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     self.__cond = Condition(Lock())
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib64/python2.7/threading.py", line 252, in Condition
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     return _Condition(*args, **kwargs)
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib64/python2.7/threading.py", line 260, in __init__
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     _Verbose.__init__(self, verbose)
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 160, in _handle_signals
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     handler(signo, frame)
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 355, in _sigterm
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     SignalHandler().clear()
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 116, in __call__
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     with lockutils.lock('singleton_lock', semaphores=cls._semaphores):
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     return self.gen.next()
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 195, in lock
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     int_lock = internal_lock(name, semaphores=semaphores)
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 160, in internal_lock
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     return semaphores.get(name)
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 105, in get
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     with self._lock:
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/eventlet/semaphore.py", line 127, in __enter__
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     self.acquire()
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/eventlet/semaphore.py", line 113, in acquire
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     hubs.get_hub().switch()
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 280, in switch
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service     assert cur is not self.greenlet, 'Cannot switch to MAINLOOP from MAINLOOP'
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service AssertionError: Cannot switch to MAINLOOP from MAINLOOP
2015-12-04 08:47:59.128 19415 ERROR oslo_service.service
Comment 2 Jon Schlueter 2015-12-04 09:33:17 EST
might want to also file an upstream bug.
Comment 4 Victor Stinner 2015-12-10 12:04:02 EST
The bug is a race condition in oslo.service. oslo.service is not a daemon, but a library used by various OpenStack services like keystone or nova. The race condition occurs when two SIGTERM are received shortly.

It looks like keystone (and other OpenStack services) are configured in systemd to send SIGTERM to all processes of the cgroups, not only to the main process. The problem is that oslo.service sends again a second SIGTERM to all child processes. The services should be configured with KillMode=process to only send SIGTERM to the main process.

I also have a fix for oslo.service to fix the root cause of the race condition, to handle correctly two SIGTERM signals sent shortly.
Comment 10 Leonid Natapov 2016-01-26 08:46:49 EST
python-oslo-service-0.9.0-2.6.el7ost.noarch

The problem doesn't reproduce for me either.
Comment 12 errata-xmlrpc 2016-04-07 17:15:58 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-0603.html
Comment 13 Ken Gaillot 2016-05-16 14:14:21 EDT
*** Bug 1290599 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.