Red Hat Bugzilla – Bug 1473710
Keystone periodically goes down leaving cloud inusable.
Last modified: 2017-08-07 11:29:35 EDT
Description of problem:Seems to be an issue with keystone within httpd module. When horizon stops working so does stack commands such as nova list. Also I see keystone errors when trying stack commands. Also nova logs show failed to fetch token from identity server when problem occurs.
Version-Release number of selected component (if applicable):
unknown. Customer encounters the issue, and nothing in the stack works. Woraround is to restart httpd and it works again. But after some time and some unknown trigger it happens again. Usually a couple of hours.
keystone goes down randomly
keystone stays up.
Debug logging was enabled for keystone, however we see nothing in keystone.log.
in /var/log/httpd/keystone_wsgi_admin_error.log we see lots of errors:
Mostly spamming of this:
[Thu Jul 20 06:44:04.233724 2017] [:error] [pid 936859] File "/usr/lib64/python2.7/contextlib.py", line 84, in helper
[Thu Jul 20 06:44:04.233736 2017] [:error] [pid 936859] <type 'exceptions.TypeError'>: 'NoneType' object is not callable
[Thu Jul 20 06:44:04.289042 2017] [:error] [pid 936856] Exception in thread Thread-1 (most likely raised during interpreter shutdown):
[Thu Jul 20 06:44:04.289107 2017] [:error] [pid 936856] Traceback (most recent call last):
[Thu Jul 20 06:44:04.289122 2017] [:error] [pid 936856] File "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
[Thu Jul 20 06:44:04.289130 2017] [:error] [pid 936856] File "/usr/lib64/python2.7/threading.py", line 764, in run
[Thu Jul 20 06:44:04.289139 2017] [:error] [pid 936856] File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 985, in _heartbeat_thread_job
###SAme thing in /var/log/httpd/keystone_wsgi_main_error.log
###/var/log/httpd/error.log we see repeated:
[Thu Jul 20 06:44:15.874675 2017] [core:notice] [pid 373561] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND'
[Thu Jul 20 08:29:11.506494 2017] [mpm_prefork:error] [pid 373561] AH00161: server reached MaxRequestWorkers setting, consider raising the MaxRequestWorkers setting
[Thu Jul 20 13:52:44.873645 2017] [mpm_prefork:notice] [pid 373561] AH00170: caught SIGWINCH, shutting down gracefully
[Thu Jul 20 13:54:23.156341 2017] [core:notice] [pid 930245] SELinux policy enabled; httpd running as context system_u:system_r:httpd_t:s0
[Thu Jul 20 13:54:23.157703 2017] [suexec:notice] [pid 930245] AH01232: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
[Thu Jul 20 13:54:23.164645 2017] [auth_digest:notice] [pid 930245] AH01757: generating secret for digest authentication ...
[Thu Jul 20 13:54:23.252579 2017] [mpm_prefork:notice] [pid 930245] AH00163: Apache/2.4.6 (Red Hat Enterprise Linux) mod_wsgi/3.4 Python/2.7.5 configured -- resuming normal operations
[Thu Jul 20 13:54:23.252621 2017] [core:notice] [pid 930245] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND'
"controller0_log.tar.gz/var/log/httpd/error_log" 176L, 22934C