I've tried to run the existing Designate Ci job with the changes needed to support TLS (--tls-everewhe e.t.c.) Unfortunately this Job has failed on Overcloud deployment stage with [1]: 2022-08-25 10:53:23.152691 | | WARNING | ERROR: Can't run container designate_pool_manage 2022-08-25 10:53:23.154644 | 52540037-eae6-8809-5dbf-00000000dd03 | FATAL | Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_5 | controller-0 | error={"changed": false, "msg": "Failed containers: designate_pool_manage"} 2022-08-25 10:53:23.155762 | 52540037-eae6-8809-5dbf-00000000dd03 | TIMING | tripleo_container_manage : Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_5 | controller-0 | 0:42:25.602759 | 64.06s In OSP17 we only support the default pool, but maybe "designate pool management" (looks like a failed container is from that area) needs some attention in terms of TLS? There is a high number of Errors in Designate logs: [2] [1] - http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-network-openstack-designate-17.0_director-rhel-virthost-3cont_2comp-ipv4-geneve/154/undercloud-0/home/stack/overcloud_install.log.gz [2] - https://paste.openstack.org/show/bLEkAt6wGW5Unb7vAjwn/
This could be a bug in tooz redis driver. The redis python module can definitely create the connection: >>> r = redis.from_url('rediss://:1NKFZEUgvAECd2vQSq0Jxrb7u.1.151:6379') >>> r.ping() True >>> r = redis.StrictRedis(password='1NKFZEUgvAECd2vQSq0Jxrb7u', port=6379, host='172.17.1.151', ssl=True) >>> r.ping() True I'll see if I can sort out exactly what's going on.
It looks like a bug in how the tooz redis driver is handling the ssl=true that is passed in as a query. For fun, I took the relevant bits out of the tooz code and did a little test: >>> import urllib >>> from oslo_utils import netutils >>> from oslo_utils import strutils >>> >>> parsed_url = netutils.urlsplit('redis://:1NKFZEUgvAECd2vQSq0Jxrb7u.1.151:6379/?ssl=true') >>> print(parsed_url) _ModifiedSplitResult(scheme='redis', netloc=':1NKFZEUgvAECd2vQSq0Jxrb7u.1.151:6379', path='/', query='ssl=true', fragment='') >>> parsed_qs = urllib.parse.parse_qs(parsed_url.query) >>> print(parsed_qs) {'ssl': ['true']} >>> >>> # As it appears in the tooz redis driver ... print(strutils.bool_from_string(parsed_qs['ssl'])) False >>> >>> # Probably the correct code ... print(strutils.bool_from_string(parsed_qs['ssl'][0])) True
It occurred to me that I should check that urllib is behaving as expected. According to the docs it is: from https://docs.python.org/3/library/urllib.parse.html " urllib.parse.parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None, separator='&') Parse a query string given as a string argument (data of type application/x-www-form-urlencoded). Data are returned as a dictionary. The dictionary keys are the unique query variable names and the values are lists of values for each name." AFAICT the tooz redis driver has had this problem for awhile.
Seems the other tooz drivers use a utility function to handle this kind of thing. I've submitted a bug u/s https://bugs.launchpad.net/python-tooz/+bug/1988059 and a patch https://review.opendev.org/c/openstack/tooz/+/855039.
This was a false lead. The tooz redis driver uses a helper function in a base class to properly pre-process the arguments so the bug isn't there.
This appears to be a known issue in eventlet (linked on the BZ). Eventlet is raising the wrong exception for a socket timeout, which redis is relying on. There is a monkey-patch the monkey-patch workaround that resolved the issue in designate central. I will propose a patch upstream in Designate. I think fixing eventlet will take too long as OpenStack is using an older version of eventlet for other issues.
@michjohn this was meant for you :)
We ran the Designate CI job with the changes needed to support TLS [1], and it passed the "Overcloud" stage. The job failed at a different stage and we opened a new BZ for that[2]. I am moving this BZ status to VERIFIED. [1] - https://rhos-ci-staging-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/network/view/openstack-designate/job/DFG-network-openstack-designate-17.0_director-rhel-virthost-3cont_2comp_1ipa-ipv4-geneve-tls/ [2] - https://bugzilla.redhat.com/show_bug.cgi?id=2154887
*** Bug 2154887 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 17.0.1 bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:0271