+++ This bug was initially created as a clone of Bug #2002659 +++ Description of problem: An ipv6 job started to fail on horizon connectivity test. The dashboard log is full with traceback, even on some not failing job. Version-Release number of selected component (if applicable): 17.0 RHOS-17.0-RHEL-8-20210908.n.1 Maybe it is actually a configuration issue, but the BZ allows only to select one component. --- Additional comment from Radomir Dopieralski on 2021-09-09 16:13:56 CEST --- Are you trying to run Horizon with Django 3? Because that is not going to work. --- Additional comment from Radomir Dopieralski on 2021-09-09 16:51:48 CEST --- Looking at the attached logs, they are all debugging errors from Django's templates. You can get rid of them by disabling DEBUG logging for the "django" component in the configuration. I havrn't found the actual error that caused that HTTP 500 response, though. --- Additional comment from Radomir Dopieralski on 2021-09-09 17:14:55 CEST --- Looks like the real error is this. It means that the memcached server that is required for Horizon to work is not running. 2021-09-09 03:11:58,603 79 ERROR django.request Internal Server Error: /dashboard/auth/login/ Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/django/core/handlers/exception.py", line 34, in inner response = get_response(request) File "/usr/lib/python3.6/site-packages/django/core/handlers/base.py", line 115, in _get_response response = self.process_exception_by_middleware(e, request) File "/usr/lib/python3.6/site-packages/django/core/handlers/base.py", line 113, in _get_response response = wrapped_callback(request, *callback_args, **callback_kwargs) File "/usr/lib/python3.6/site-packages/django/views/decorators/debug.py", line 76, in sensitive_post_parameters_wrapper return view(request, *args, **kwargs) File "/usr/lib/python3.6/site-packages/django/utils/decorators.py", line 142, in _wrapped_view response = view_func(request, *args, **kwargs) File "/usr/lib/python3.6/site-packages/django/views/decorators/cache.py", line 44, in _wrapped_view_func response = view_func(request, *args, **kwargs) File "/usr/lib/python3.6/site-packages/openstack_auth/views.py", line 148, in login redirect_authenticated_user=False)(request) File "/usr/lib/python3.6/site-packages/django/views/generic/base.py", line 71, in view return self.dispatch(request, *args, **kwargs) File "/usr/lib/python3.6/site-packages/django/utils/decorators.py", line 45, in _wrapper return bound_method(*args, **kwargs) File "/usr/lib/python3.6/site-packages/django/views/decorators/debug.py", line 76, in sensitive_post_parameters_wrapper return view(request, *args, **kwargs) File "/usr/lib/python3.6/site-packages/django/utils/decorators.py", line 45, in _wrapper return bound_method(*args, **kwargs) File "/usr/lib/python3.6/site-packages/django/utils/decorators.py", line 142, in _wrapped_view response = view_func(request, *args, **kwargs) File "/usr/lib/python3.6/site-packages/django/utils/decorators.py", line 45, in _wrapper return bound_method(*args, **kwargs) File "/usr/lib/python3.6/site-packages/django/views/decorators/cache.py", line 44, in _wrapped_view_func response = view_func(request, *args, **kwargs) File "/usr/lib/python3.6/site-packages/django/contrib/auth/views.py", line 61, in dispatch return super().dispatch(request, *args, **kwargs) File "/usr/lib/python3.6/site-packages/django/views/generic/base.py", line 97, in dispatch return handler(request, *args, **kwargs) File "/usr/lib/python3.6/site-packages/django/views/generic/edit.py", line 142, in post return self.form_valid(form) File "/usr/lib/python3.6/site-packages/django/contrib/auth/views.py", line 90, in form_valid auth_login(self.request, form.get_user()) File "/usr/lib/python3.6/site-packages/django/contrib/auth/__init__.py", line 108, in login request.session.cycle_key() File "/usr/lib/python3.6/site-packages/django/contrib/sessions/backends/base.py", line 297, in cycle_key self.create() File "/usr/lib/python3.6/site-packages/django/contrib/sessions/backends/cache.py", line 51, in create "Unable to create a new session key. " RuntimeError: Unable to create a new session key. It is likely that the cache is unavailable. --- Additional comment from Radomir Dopieralski on 2021-09-09 17:15:36 CEST --- Can we look at the memcached logs? --- Additional comment from Attila Fazekas on 2021-09-09 17:28:56 CEST --- The /var/log/containers/memcached/memcached.log empty files on both controllers . The issue might not be 100% reproducible, it happened a few run before as well. --- Additional comment from Radomir Dopieralski on 2021-09-10 12:19:21 CEST --- If memcached crashed, or didn't start at all, that would explain it — it's required for Horizon. --- Additional comment from Attila Fazekas on 2021-09-10 12:24:22 CEST --- Maybe worth to read the iptables/nftables rules and configured addresses. --- Additional comment from Attila Fazekas on 2021-09-10 12:29:15 CEST --- CACHES = { 'default': { 'BACKEND': 'django.core.cache.backends.memcached.MemcachedCache', 'LOCATION': [ 'controller-0.internalapi.redhat.local:11211','controller-1.internalapi.redhat.local:11211','controller-2.internalapi.redhat.local:11211', ], } } it is configured by names, the names might not work. --- Additional comment from Attila Fazekas on 2021-09-10 12:33:50 CEST --- -A INPUT -s fd00:fd00:fd00:2000::/64 -p tcp -m tcp --dport 11211 -m conntrack --ctstate NEW -m comment --comment "121 memcached fd00:fd00:fd00:2000::/64 ipv6" -j ACCEPT If the names resolves to an ipv6 address the the route needs to choose a source address/interface with the above subnet. --- Additional comment from Radomir Dopieralski on 2021-09-10 13:12:25 CEST --- The memcached process do run, and they are bound to ports. The container's healthcheck also seems to be fine. One thing that we noticed is that the addresses that Horizon uses for memcached: 'LOCATION': [ 'controller-0.internalapi.redhat.local:11211','controller-1.internalapi.redhat.local:11211','controller-2.internalapi.redhat.local:11211', ], are mixed up in the /etc/hosts — they point to incorrect controllers. --- Additional comment from Radomir Dopieralski on 2021-09-10 14:25:51 CEST --- I wonder if the messed up /etc/hosts means that the routing is also messed up, and the wrong IPv6 addresses are routed to wrong hosts? Is there some way we could check this? Can we try telneting to the memcached ports from the horizon container, and sending the "version" command? --- Additional comment from Attila Fazekas on 2021-09-10 14:40:31 CEST --- If the names are persistent in any collected log files you may see it. Each nodes var/log/extra contains outputs from extra log collection command. The config files (etc) also persisted from each container. Today unlikely I will have time to check it on a live system, but we can arrange memcached access test on a live system next week. --- Additional comment from Radomir Dopieralski on 2021-09-10 15:41:24 CEST --- Thanks, let's do that then. --- Additional comment from Yatin Karel on 2021-09-13 08:11:31 CEST --- This looks related to what was faced upstream https://bugs.launchpad.net/tripleo/+bug/1939023, there it was worked around by using memcached ips instead of hostnames, /me not sure about root cause why it didn't worked with names, but good to get it working with hostnames too. --- Additional comment from Attila Fazekas on 2021-09-13 14:47:48 CEST --- [heat-admin@controller-0 ~]$ sudo podman exec -it horizon bash [root@controller-0 /]# python3 Python 3.6.8 (default, Aug 12 2021, 07:06:15) [GCC 8.4.1 20200928 (Red Hat 8.4.1-1)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import pymemcache >>> pymemcache.client.base.Client('controller-1.internalapi.redhat.local').version() b'1.5.22' >>> pymemcache.client.base.Client('controller-0.internalapi.redhat.local').version() b'1.5.22' >>> pymemcache.client.base.Client('controller-2.internalapi.redhat.local').version() b'1.5.22' >>> pymemcache.client.base.Client('controller-3.internalapi.redhat.local').version() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python3.6/site-packages/pymemcache/client/base.py", line 774, in version results = self._misc_cmd([cmd], b'version', False) File "/usr/lib/python3.6/site-packages/pymemcache/client/base.py", line 999, in _misc_cmd self._connect() File "/usr/lib/python3.6/site-packages/pymemcache/client/base.py", line 314, in _connect s.IPPROTO_TCP) File "/usr/lib64/python3.6/socket.py", line 745, in getaddrinfo for res in _socket.getaddrinfo(host, port, family, type, proto, flags): socket.gaierror: [Errno -2] Name or service not known The last one, just to try is it really fails on non existing host. 11211 is the default port. on undercloud: curl -vv http://[2620:52:0:13b8:5054:ff:fe3e:3e]/dashboard/auth/login/ 200 with a GET request. checking.. --- Additional comment from Attila Fazekas on 2021-09-13 16:45:21 CEST --- Hmm, horizon, django actually using a different memcache. import memcache client = memcache.Client(['controller-0.internalapi.redhat.local:11211', 'controller-1.internalapi.redhat.local:11211', 'controller-2.internalapi.redhat.local:11211'], {'pickleProtocol': 4}) client.set(':1:django.contrib.sessions.cacheb7gd8a62z2t0dpy0hmbdnqmu87bo98xb', {'auth_type': 'credentials', 'unscoped_token': 'gAAAAABhP2DERbZMi_N8y1PeXOco-qSSf7HiCdyyK6L61VN-3-1-dztXt9ssaRXwCScXgvgRVEIFbWSuMXEWjrKK-esSueX5jdJmw_sqeLZCgQSOtja_3sw2yG7DRiTjQHrNnq3TL3ageHVBTiGVKhXIQhpjVVtrOA', '_session_expiry': 1800}) MemCached: MemCache: inet:controller-0.internalapi.redhat.local:11211: connect: [Errno -2] Name or service not known. Marking dead. MemCached: MemCache: inet:controller-2.internalapi.redhat.local:11211: connect: [Errno -2] Name or service not known. Marking dead. MemCached: MemCache: inet:controller-1.internalapi.redhat.local:11211: connect: [Errno -2] Name or service not known. Marking dead. 0 inet6:[ipv6] works, does not seams to work with hostnames. So different Memcached library needs to be used or ipv6 address passed without the names. --- Additional comment from Attila Fazekas on 2021-09-13 16:46:00 CEST --- rpm -qf /usr/lib/python3.6/site-packages/memcache.py python3-memcached-1.59-1.el8ost.1.noarch --- Additional comment from Attila Fazekas on 2021-09-14 10:31:01 CEST --- Probably you want to switch to a different memcached library: https://github.com/linsomniac/python-memcached/issues/177 --- Additional comment from Radomir Dopieralski on 2021-09-14 10:55:17 CEST --- We are not going to switch Django to a different memcached library in this release. Once we switch to Django 3.x in OSP19 we can use a different library. --- Additional comment from Radomir Dopieralski on 2021-09-14 11:00:36 CEST --- Ok, I was wrong, looking at https://docs.djangoproject.com/en/2.2/topics/cache/#memcached we should be able to use django.core.cache.backends.memcached.PyLibMCCache in 2.2. --- Additional comment from Radomir Dopieralski on 2021-09-14 11:54:10 CEST --- Looking at upstream puppet-horizon, we should be using IP addresses, not URLs in that place in configuration: https://github.com/openstack/puppet-horizon/blob/master/templates/local_settings.py.erb#L238 --- Additional comment from Attila Fazekas on 2021-09-14 12:07:08 CEST --- Ok so using ips there should be a product default, not a CI workaround. --- Additional comment from Radomir Dopieralski on 2021-10-04 14:41:04 CEST --- Looks like it was broken with this commit: https://github.com/openstack/puppet-tripleo/commit/49921d57f5753dffe032b9501d1101707ce8cc1e#diff-6c78725a43f0ff6bea9069d7944ef620b167c34fcddbf0529dc000564ab70ba6 --- Additional comment from Radomir Dopieralski on 2021-10-11 13:36:14 CEST --- The patch reverting the above commit has been merged upstream.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 16.2.2), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:1001