Bug 2002659 - osp17 dashboard 500 and many errors
Summary: osp17 dashboard 500 and many errors
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: puppet-tripleo
Version: 17.0 (Wallaby)
Hardware: Unspecified
OS: Unspecified
medium
urgent
Target Milestone: ---
: ---
Assignee: Radomir Dopieralski
QA Contact: ikanias
URL:
Whiteboard:
Depends On:
Blocks: 2046284
TreeView+ depends on / blocked
 
Reported: 2021-09-09 12:58 UTC by Attila Fazekas
Modified: 2022-09-21 12:17 UTC (History)
7 users (show)

Fixed In Version: puppet-tripleo-14.2.3-0.20211030221907.4c2c990.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2046284 (view as bug list)
Environment:
Last Closed: 2022-09-21 12:17:06 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
horizon.log.gz (57.61 KB, application/gzip)
2021-09-09 12:58 UTC, Attila Fazekas
no flags Details
local_settings.gz (9.08 KB, application/gzip)
2021-09-09 12:59 UTC, Attila Fazekas
no flags Details


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 812349 0 None NEW Revert Horizon's Memecached config to use IP addresses 2021-10-04 13:09:16 UTC
Red Hat Issue Tracker DFGUI-1769 0 None Closed [Insights/Rule/Bug] Rule KDUMP_NOT_START_ON_AZURE_AND_HYPERV inconsistently reported 2022-06-20 09:09:28 UTC
Red Hat Issue Tracker OSP-8704 0 None None None 2021-11-15 12:50:29 UTC
Red Hat Product Errata RHEA-2022:6543 0 None None None 2022-09-21 12:17:40 UTC

Description Attila Fazekas 2021-09-09 12:58:33 UTC
Created attachment 1821806 [details]
horizon.log.gz

Description of problem:
An ipv6 job started to fail on horizon connectivity test.

The dashboard log is full with traceback, even on some not failing job.


Version-Release number of selected component (if applicable):
17.0
RHOS-17.0-RHEL-8-20210908.n.1


Maybe it is actually a configuration issue,
but the BZ allows only to select one component.

Comment 1 Attila Fazekas 2021-09-09 12:59:37 UTC
Created attachment 1821807 [details]
local_settings.gz

Comment 3 Radomir Dopieralski 2021-09-09 14:13:56 UTC
Are you trying to run Horizon with Django 3? Because that is not going to work.

Comment 4 Radomir Dopieralski 2021-09-09 14:51:48 UTC
Looking at the attached logs, they are all debugging errors from Django's templates. You can get rid of them by disabling DEBUG logging for the "django" component in the configuration. I havrn't found the actual error that caused that HTTP 500 response, though.

Comment 5 Radomir Dopieralski 2021-09-09 15:14:55 UTC
Looks like the real error is this. It means that the memcached server that is required for Horizon to work is not running.


2021-09-09 03:11:58,603 79 ERROR django.request Internal Server Error: /dashboard/auth/login/
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/django/core/handlers/exception.py", line 34, in inner
    response = get_response(request)
  File "/usr/lib/python3.6/site-packages/django/core/handlers/base.py", line 115, in _get_response
    response = self.process_exception_by_middleware(e, request)
  File "/usr/lib/python3.6/site-packages/django/core/handlers/base.py", line 113, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/usr/lib/python3.6/site-packages/django/views/decorators/debug.py", line 76, in sensitive_post_parameters_wrapper
    return view(request, *args, **kwargs)
  File "/usr/lib/python3.6/site-packages/django/utils/decorators.py", line 142, in _wrapped_view
    response = view_func(request, *args, **kwargs)
  File "/usr/lib/python3.6/site-packages/django/views/decorators/cache.py", line 44, in _wrapped_view_func
    response = view_func(request, *args, **kwargs)
  File "/usr/lib/python3.6/site-packages/openstack_auth/views.py", line 148, in login
    redirect_authenticated_user=False)(request)
  File "/usr/lib/python3.6/site-packages/django/views/generic/base.py", line 71, in view
    return self.dispatch(request, *args, **kwargs)
  File "/usr/lib/python3.6/site-packages/django/utils/decorators.py", line 45, in _wrapper
    return bound_method(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/django/views/decorators/debug.py", line 76, in sensitive_post_parameters_wrapper
    return view(request, *args, **kwargs)
  File "/usr/lib/python3.6/site-packages/django/utils/decorators.py", line 45, in _wrapper
    return bound_method(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/django/utils/decorators.py", line 142, in _wrapped_view
    response = view_func(request, *args, **kwargs)
  File "/usr/lib/python3.6/site-packages/django/utils/decorators.py", line 45, in _wrapper
    return bound_method(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/django/views/decorators/cache.py", line 44, in _wrapped_view_func
    response = view_func(request, *args, **kwargs)
  File "/usr/lib/python3.6/site-packages/django/contrib/auth/views.py", line 61, in dispatch
    return super().dispatch(request, *args, **kwargs)
  File "/usr/lib/python3.6/site-packages/django/views/generic/base.py", line 97, in dispatch
    return handler(request, *args, **kwargs)
  File "/usr/lib/python3.6/site-packages/django/views/generic/edit.py", line 142, in post
    return self.form_valid(form)
  File "/usr/lib/python3.6/site-packages/django/contrib/auth/views.py", line 90, in form_valid
    auth_login(self.request, form.get_user())
  File "/usr/lib/python3.6/site-packages/django/contrib/auth/__init__.py", line 108, in login
    request.session.cycle_key()
  File "/usr/lib/python3.6/site-packages/django/contrib/sessions/backends/base.py", line 297, in cycle_key
    self.create()
  File "/usr/lib/python3.6/site-packages/django/contrib/sessions/backends/cache.py", line 51, in create
    "Unable to create a new session key. "
RuntimeError: Unable to create a new session key. It is likely that the cache is unavailable.

Comment 6 Radomir Dopieralski 2021-09-09 15:15:36 UTC
Can we look at the memcached logs?

Comment 7 Attila Fazekas 2021-09-09 15:28:56 UTC
The /var/log/containers/memcached/memcached.log empty files on both controllers .

The issue might not be 100% reproducible,
it happened a few run before as well.

Comment 9 Radomir Dopieralski 2021-09-10 10:19:21 UTC
If memcached crashed, or didn't start at all, that would explain it — it's required for Horizon.

Comment 10 Attila Fazekas 2021-09-10 10:24:22 UTC
Maybe worth to read the iptables/nftables rules and configured addresses.

Comment 11 Attila Fazekas 2021-09-10 10:29:15 UTC
CACHES = {
    'default': {
        'BACKEND': 'django.core.cache.backends.memcached.MemcachedCache',
    
          
            
        'LOCATION': [ 'controller-0.internalapi.redhat.local:11211','controller-1.internalapi.redhat.local:11211','controller-2.internalapi.redhat.local:11211', ],
          
    
    
    }
}

it is configured by names, the names might not work.

Comment 12 Attila Fazekas 2021-09-10 10:33:50 UTC
-A INPUT -s fd00:fd00:fd00:2000::/64 -p tcp -m tcp --dport 11211 -m conntrack --ctstate NEW -m comment --comment "121 memcached fd00:fd00:fd00:2000::/64 ipv6" -j ACCEPT

If the names resolves to an ipv6 address the the route needs to choose a source address/interface with the above subnet.

Comment 13 Radomir Dopieralski 2021-09-10 11:12:25 UTC
The memcached process do run, and they are bound to ports. The container's healthcheck also seems to be fine.

One thing that we noticed is that the addresses that Horizon uses for memcached:

        'LOCATION': [ 'controller-0.internalapi.redhat.local:11211','controller-1.internalapi.redhat.local:11211','controller-2.internalapi.redhat.local:11211', ],

are mixed up in the /etc/hosts — they point to incorrect controllers.

Comment 14 Radomir Dopieralski 2021-09-10 12:25:51 UTC
I wonder if the messed up /etc/hosts means that the routing is also messed up, and the wrong IPv6 addresses are routed to wrong hosts? Is there some way we could check this?

Can we try telneting to the memcached ports from the horizon container, and sending the "version" command?

Comment 15 Attila Fazekas 2021-09-10 12:40:31 UTC
If the names are persistent in any collected log files you may see it.
Each nodes var/log/extra contains outputs from extra log collection command.
The config files (etc) also persisted from each container.

Today unlikely I will have time to check it on a live system, but we can arrange memcached access test on a live system next week.

Comment 16 Radomir Dopieralski 2021-09-10 13:41:24 UTC
Thanks, let's do that then.

Comment 17 Yatin Karel 2021-09-13 06:11:31 UTC
This looks related to what was faced upstream https://bugs.launchpad.net/tripleo/+bug/1939023, there it was worked around by using memcached ips instead of hostnames, /me not sure about root cause why it didn't worked with names, but good to get it working with hostnames too.

Comment 18 Attila Fazekas 2021-09-13 12:47:48 UTC
[heat-admin@controller-0 ~]$ sudo podman exec -it horizon bash
[root@controller-0 /]# python3
Python 3.6.8 (default, Aug 12 2021, 07:06:15) 
[GCC 8.4.1 20200928 (Red Hat 8.4.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pymemcache
>>> pymemcache.client.base.Client('controller-1.internalapi.redhat.local').version()
b'1.5.22'
>>> pymemcache.client.base.Client('controller-0.internalapi.redhat.local').version()
b'1.5.22'
>>> pymemcache.client.base.Client('controller-2.internalapi.redhat.local').version()
b'1.5.22'
>>> pymemcache.client.base.Client('controller-3.internalapi.redhat.local').version()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.6/site-packages/pymemcache/client/base.py", line 774, in version
    results = self._misc_cmd([cmd], b'version', False)
  File "/usr/lib/python3.6/site-packages/pymemcache/client/base.py", line 999, in _misc_cmd
    self._connect()
  File "/usr/lib/python3.6/site-packages/pymemcache/client/base.py", line 314, in _connect
    s.IPPROTO_TCP)
  File "/usr/lib64/python3.6/socket.py", line 745, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known


The last one, just to try is it really fails on non existing host.
11211 is the default port.

on undercloud:
curl -vv http://[2620:52:0:13b8:5054:ff:fe3e:3e]/dashboard/auth/login/
200 with a GET request.

checking..

Comment 19 Attila Fazekas 2021-09-13 14:45:21 UTC
Hmm, horizon, django actually using a different memcache.

import memcache
client = memcache.Client(['controller-0.internalapi.redhat.local:11211', 'controller-1.internalapi.redhat.local:11211', 'controller-2.internalapi.redhat.local:11211'], {'pickleProtocol': 4})
client.set(':1:django.contrib.sessions.cacheb7gd8a62z2t0dpy0hmbdnqmu87bo98xb', {'auth_type': 'credentials', 'unscoped_token': 'gAAAAABhP2DERbZMi_N8y1PeXOco-qSSf7HiCdyyK6L61VN-3-1-dztXt9ssaRXwCScXgvgRVEIFbWSuMXEWjrKK-esSueX5jdJmw_sqeLZCgQSOtja_3sw2yG7DRiTjQHrNnq3TL3ageHVBTiGVKhXIQhpjVVtrOA', '_session_expiry': 1800})
MemCached: MemCache: inet:controller-0.internalapi.redhat.local:11211: connect: [Errno -2] Name or service not known.  Marking dead.
MemCached: MemCache: inet:controller-2.internalapi.redhat.local:11211: connect: [Errno -2] Name or service not known.  Marking dead.
MemCached: MemCache: inet:controller-1.internalapi.redhat.local:11211: connect: [Errno -2] Name or service not known.  Marking dead.
0

inet6:[ipv6] works, does not seams to work with hostnames.

So different Memcached library needs to be used or ipv6 address passed without the names.

Comment 20 Attila Fazekas 2021-09-13 14:46:00 UTC
rpm -qf /usr/lib/python3.6/site-packages/memcache.py
python3-memcached-1.59-1.el8ost.1.noarch

Comment 21 Attila Fazekas 2021-09-14 08:31:01 UTC
Probably you want to switch to a different memcached library:
https://github.com/linsomniac/python-memcached/issues/177

Comment 22 Radomir Dopieralski 2021-09-14 08:55:17 UTC
We are not going to switch Django to a different memcached library in this release.

Once we switch to Django 3.x in OSP19 we can use a different library.

Comment 23 Radomir Dopieralski 2021-09-14 09:00:36 UTC
Ok, I was wrong, looking at https://docs.djangoproject.com/en/2.2/topics/cache/#memcached we should be able to use django.core.cache.backends.memcached.PyLibMCCache in 2.2.

Comment 24 Radomir Dopieralski 2021-09-14 09:54:10 UTC
Looking at upstream puppet-horizon, we should be using IP addresses, not URLs in that place in configuration: https://github.com/openstack/puppet-horizon/blob/master/templates/local_settings.py.erb#L238

Comment 25 Attila Fazekas 2021-09-14 10:07:08 UTC
Ok so using ips there should be a product default, not a CI workaround.

Comment 27 Radomir Dopieralski 2021-10-11 11:36:14 UTC
The patch reverting the above commit has been merged upstream.

Comment 32 errata-xmlrpc 2022-09-21 12:17:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2022:6543


Note You need to log in before you can comment on or make changes to this bug.