Bug 1473710

Summary: Keystone periodically goes down leaving cloud inusable.
Product: Red Hat OpenStack Reporter: Jeremy <jmelvin>
Component: openstack-keystoneAssignee: John Dennis <jdennis>
Status: CLOSED NOTABUG QA Contact: nlevinki <nlevinki>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 10.0 (Newton)CC: bshephar, guangjian, nkinder, panbalag, srevivo
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-07 15:29:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jeremy 2017-07-21 13:24:53 UTC
Description of problem:Seems to be an issue with keystone within httpd module. When horizon stops working so does stack commands such as nova list. Also I see keystone errors when trying stack commands. Also nova logs show failed to fetch token from identity server when problem occurs.


Version-Release number of selected component (if applicable):
openstack-keystone-10.0.0-4.el7ost.noarch
httpd-2.4.6-45.el7.x86_64                                  



How reproducible:
unknown. Customer encounters the issue, and nothing in the stack works. Woraround is to restart httpd and it works again. But after some time and some unknown trigger it happens again. Usually a couple of hours.


Actual results:
keystone goes down randomly

Expected results:
keystone stays up.

Additional info:

Debug logging was enabled for keystone, however we see nothing in keystone.log.

in /var/log/httpd/keystone_wsgi_admin_error.log we see lots of errors:

Mostly spamming of this:
[Thu Jul 20 06:44:04.233724 2017] [:error] [pid 936859]   File "/usr/lib64/python2.7/contextlib.py", line 84, in helper
[Thu Jul 20 06:44:04.233736 2017] [:error] [pid 936859] <type 'exceptions.TypeError'>: 'NoneType' object is not callable
[Thu Jul 20 06:44:04.289042 2017] [:error] [pid 936856] Exception in thread Thread-1 (most likely raised during interpreter shutdown):
[Thu Jul 20 06:44:04.289107 2017] [:error] [pid 936856] Traceback (most recent call last):
[Thu Jul 20 06:44:04.289122 2017] [:error] [pid 936856]   File "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
[Thu Jul 20 06:44:04.289130 2017] [:error] [pid 936856]   File "/usr/lib64/python2.7/threading.py", line 764, in run
[Thu Jul 20 06:44:04.289139 2017] [:error] [pid 936856]   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 985, in _heartbeat_thread_job


###SAme thing in /var/log/httpd/keystone_wsgi_main_error.log

###/var/log/httpd/error.log we see repeated:

[Thu Jul 20 06:44:15.874675 2017] [core:notice] [pid 373561] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND'
[Thu Jul 20 08:29:11.506494 2017] [mpm_prefork:error] [pid 373561] AH00161: server reached MaxRequestWorkers setting, consider raising the MaxRequestWorkers setting
[Thu Jul 20 13:52:44.873645 2017] [mpm_prefork:notice] [pid 373561] AH00170: caught SIGWINCH, shutting down gracefully
[Thu Jul 20 13:54:23.156341 2017] [core:notice] [pid 930245] SELinux policy enabled; httpd running as context system_u:system_r:httpd_t:s0
[Thu Jul 20 13:54:23.157703 2017] [suexec:notice] [pid 930245] AH01232: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
[Thu Jul 20 13:54:23.164645 2017] [auth_digest:notice] [pid 930245] AH01757: generating secret for digest authentication ...
[Thu Jul 20 13:54:23.252579 2017] [mpm_prefork:notice] [pid 930245] AH00163: Apache/2.4.6 (Red Hat Enterprise Linux) mod_wsgi/3.4 Python/2.7.5 configured -- resuming normal operations
[Thu Jul 20 13:54:23.252621 2017] [core:notice] [pid 930245] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND'
"controller0_log.tar.gz/var/log/httpd/error_log" 176L, 22934C

Comment 5 guangjian 2020-01-16 06:11:49 UTC
I meet the same issue, any comments on how to resolve it?

Comment 6 Brendan Shephard 2020-02-28 04:42:14 UTC
(In reply to guangjian from comment #5)
> I meet the same issue, any comments on how to resolve it?

In this case, the issue was resolved by doing the following:
https://access.redhat.com/solutions/3032371

And 
https://access.redhat.com/solutions/3392311