Bug 1473710 - Keystone periodically goes down leaving cloud inusable.
Keystone periodically goes down leaving cloud inusable.
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-keystone (Show other bugs)
10.0 (Newton)
Unspecified Unspecified
urgent Severity urgent
: ---
: ---
Assigned To: John Dennis
Depends On:
  Show dependency treegraph
Reported: 2017-07-21 09:24 EDT by Jeremy
Modified: 2017-09-18 13:48 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2017-08-07 11:29:35 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Jeremy 2017-07-21 09:24:53 EDT
Description of problem:Seems to be an issue with keystone within httpd module. When horizon stops working so does stack commands such as nova list. Also I see keystone errors when trying stack commands. Also nova logs show failed to fetch token from identity server when problem occurs.

Version-Release number of selected component (if applicable):

How reproducible:
unknown. Customer encounters the issue, and nothing in the stack works. Woraround is to restart httpd and it works again. But after some time and some unknown trigger it happens again. Usually a couple of hours.

Actual results:
keystone goes down randomly

Expected results:
keystone stays up.

Additional info:

Debug logging was enabled for keystone, however we see nothing in keystone.log.

in /var/log/httpd/keystone_wsgi_admin_error.log we see lots of errors:

Mostly spamming of this:
[Thu Jul 20 06:44:04.233724 2017] [:error] [pid 936859]   File "/usr/lib64/python2.7/contextlib.py", line 84, in helper
[Thu Jul 20 06:44:04.233736 2017] [:error] [pid 936859] <type 'exceptions.TypeError'>: 'NoneType' object is not callable
[Thu Jul 20 06:44:04.289042 2017] [:error] [pid 936856] Exception in thread Thread-1 (most likely raised during interpreter shutdown):
[Thu Jul 20 06:44:04.289107 2017] [:error] [pid 936856] Traceback (most recent call last):
[Thu Jul 20 06:44:04.289122 2017] [:error] [pid 936856]   File "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
[Thu Jul 20 06:44:04.289130 2017] [:error] [pid 936856]   File "/usr/lib64/python2.7/threading.py", line 764, in run
[Thu Jul 20 06:44:04.289139 2017] [:error] [pid 936856]   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 985, in _heartbeat_thread_job

###SAme thing in /var/log/httpd/keystone_wsgi_main_error.log

###/var/log/httpd/error.log we see repeated:

[Thu Jul 20 06:44:15.874675 2017] [core:notice] [pid 373561] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND'
[Thu Jul 20 08:29:11.506494 2017] [mpm_prefork:error] [pid 373561] AH00161: server reached MaxRequestWorkers setting, consider raising the MaxRequestWorkers setting
[Thu Jul 20 13:52:44.873645 2017] [mpm_prefork:notice] [pid 373561] AH00170: caught SIGWINCH, shutting down gracefully
[Thu Jul 20 13:54:23.156341 2017] [core:notice] [pid 930245] SELinux policy enabled; httpd running as context system_u:system_r:httpd_t:s0
[Thu Jul 20 13:54:23.157703 2017] [suexec:notice] [pid 930245] AH01232: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
[Thu Jul 20 13:54:23.164645 2017] [auth_digest:notice] [pid 930245] AH01757: generating secret for digest authentication ...
[Thu Jul 20 13:54:23.252579 2017] [mpm_prefork:notice] [pid 930245] AH00163: Apache/2.4.6 (Red Hat Enterprise Linux) mod_wsgi/3.4 Python/2.7.5 configured -- resuming normal operations
[Thu Jul 20 13:54:23.252621 2017] [core:notice] [pid 930245] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND'
"controller0_log.tar.gz/var/log/httpd/error_log" 176L, 22934C

Note You need to log in before you can comment on or make changes to this bug.