Bug 1473710 - Keystone periodically goes down leaving cloud inusable.
Summary: Keystone periodically goes down leaving cloud inusable.
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-keystone
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: ---
Assignee: John Dennis
QA Contact: nlevinki
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-07-21 13:24 UTC by Jeremy
Modified: 2020-09-10 11:00 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-07 15:29:35 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Jeremy 2017-07-21 13:24:53 UTC
Description of problem:Seems to be an issue with keystone within httpd module. When horizon stops working so does stack commands such as nova list. Also I see keystone errors when trying stack commands. Also nova logs show failed to fetch token from identity server when problem occurs.


Version-Release number of selected component (if applicable):
openstack-keystone-10.0.0-4.el7ost.noarch
httpd-2.4.6-45.el7.x86_64                                  



How reproducible:
unknown. Customer encounters the issue, and nothing in the stack works. Woraround is to restart httpd and it works again. But after some time and some unknown trigger it happens again. Usually a couple of hours.


Actual results:
keystone goes down randomly

Expected results:
keystone stays up.

Additional info:

Debug logging was enabled for keystone, however we see nothing in keystone.log.

in /var/log/httpd/keystone_wsgi_admin_error.log we see lots of errors:

Mostly spamming of this:
[Thu Jul 20 06:44:04.233724 2017] [:error] [pid 936859]   File "/usr/lib64/python2.7/contextlib.py", line 84, in helper
[Thu Jul 20 06:44:04.233736 2017] [:error] [pid 936859] <type 'exceptions.TypeError'>: 'NoneType' object is not callable
[Thu Jul 20 06:44:04.289042 2017] [:error] [pid 936856] Exception in thread Thread-1 (most likely raised during interpreter shutdown):
[Thu Jul 20 06:44:04.289107 2017] [:error] [pid 936856] Traceback (most recent call last):
[Thu Jul 20 06:44:04.289122 2017] [:error] [pid 936856]   File "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
[Thu Jul 20 06:44:04.289130 2017] [:error] [pid 936856]   File "/usr/lib64/python2.7/threading.py", line 764, in run
[Thu Jul 20 06:44:04.289139 2017] [:error] [pid 936856]   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 985, in _heartbeat_thread_job


###SAme thing in /var/log/httpd/keystone_wsgi_main_error.log

###/var/log/httpd/error.log we see repeated:

[Thu Jul 20 06:44:15.874675 2017] [core:notice] [pid 373561] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND'
[Thu Jul 20 08:29:11.506494 2017] [mpm_prefork:error] [pid 373561] AH00161: server reached MaxRequestWorkers setting, consider raising the MaxRequestWorkers setting
[Thu Jul 20 13:52:44.873645 2017] [mpm_prefork:notice] [pid 373561] AH00170: caught SIGWINCH, shutting down gracefully
[Thu Jul 20 13:54:23.156341 2017] [core:notice] [pid 930245] SELinux policy enabled; httpd running as context system_u:system_r:httpd_t:s0
[Thu Jul 20 13:54:23.157703 2017] [suexec:notice] [pid 930245] AH01232: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
[Thu Jul 20 13:54:23.164645 2017] [auth_digest:notice] [pid 930245] AH01757: generating secret for digest authentication ...
[Thu Jul 20 13:54:23.252579 2017] [mpm_prefork:notice] [pid 930245] AH00163: Apache/2.4.6 (Red Hat Enterprise Linux) mod_wsgi/3.4 Python/2.7.5 configured -- resuming normal operations
[Thu Jul 20 13:54:23.252621 2017] [core:notice] [pid 930245] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND'
"controller0_log.tar.gz/var/log/httpd/error_log" 176L, 22934C

Comment 5 guangjian 2020-01-16 06:11:49 UTC
I meet the same issue, any comments on how to resolve it?

Comment 6 Brendan Shephard 2020-02-28 04:42:14 UTC
(In reply to guangjian from comment #5)
> I meet the same issue, any comments on how to resolve it?

In this case, the issue was resolved by doing the following:
https://access.redhat.com/solutions/3032371

And 
https://access.redhat.com/solutions/3392311


Note You need to log in before you can comment on or make changes to this bug.