Bug 2121634 - Designate DNS - CI fails on: "Overcloud deploy stage" when the Job is configured to be "TLS everywhere"
Summary: Designate DNS - CI fails on: "Overcloud deploy stage" when the Job is configu...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-designate
Version: 17.0 (Wallaby)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z1
: 17.0
Assignee: Michael Johnson
QA Contact: Toni Freger
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-08-26 05:33 UTC by Arkady Shtempler
Modified: 2023-01-25 12:29 UTC (History)
10 users (show)

Fixed In Version: openstack-designate-12.1.0-0.20220923170221.f255747.el9ost
Doc Type: Bug Fix
Doc Text:
Before this update, the Red Hat OpenStack Platform (RHOSP) DNS service (designate) was unable to start its central process when TLS-everywhere was enabled. This was caused by an inability to connect to Redis over TLS. With this update in RHOSP 17.0.1, this issue has been resolved.
Clone Of:
Environment:
Last Closed: 2023-01-25 12:28:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github eventlet eventlet issues 692 0 None open Eventlet raises a different exception on SSL timeouts vs standard Python 2022-09-07 16:28:57 UTC
Launchpad 1989020 0 None None None 2022-09-07 18:43:22 UTC
OpenStack gerrit 856313 0 None MERGED Fix Redis connection over TLS 2022-09-21 19:19:29 UTC
OpenStack gerrit 857488 0 None MERGED Fix Redis connection over TLS 2022-09-21 19:20:28 UTC
Red Hat Issue Tracker OSP-18413 0 None None None 2022-08-26 05:34:42 UTC
Red Hat Product Errata RHBA-2023:0271 0 None None None 2023-01-25 12:29:08 UTC

Description Arkady Shtempler 2022-08-26 05:33:50 UTC
I've tried to run the existing Designate Ci job with the changes needed to support TLS (--tls-everewhe e.t.c.) 
Unfortunately this Job has failed on Overcloud deployment stage with [1]:
 
2022-08-25 10:53:23.152691 |                                      |    WARNING | ERROR: Can't run container designate_pool_manage
2022-08-25 10:53:23.154644 | 52540037-eae6-8809-5dbf-00000000dd03 |      FATAL | Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_5 | controller-0 | error={"changed": false, "msg": "Failed containers: designate_pool_manage"}
2022-08-25 10:53:23.155762 | 52540037-eae6-8809-5dbf-00000000dd03 |     TIMING | tripleo_container_manage : Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_5 | controller-0 | 0:42:25.602759 | 64.06s
In OSP17 we only support the default pool, but maybe "designate pool management" (looks like a failed container is from that area) needs some attention in terms of TLS?

There is a high number of Errors in Designate logs: [2]

[1] - http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-network-openstack-designate-17.0_director-rhel-virthost-3cont_2comp-ipv4-geneve/154/undercloud-0/home/stack/overcloud_install.log.gz
[2] - https://paste.openstack.org/show/bLEkAt6wGW5Unb7vAjwn/

Comment 3 Brent Eagles 2022-08-26 14:56:19 UTC
This could be a bug in tooz redis driver. The redis python module can definitely create the connection:

>>> r = redis.from_url('rediss://:1NKFZEUgvAECd2vQSq0Jxrb7u.1.151:6379')
>>> r.ping()
True

>>> r = redis.StrictRedis(password='1NKFZEUgvAECd2vQSq0Jxrb7u', port=6379, host='172.17.1.151', ssl=True)
>>> r.ping()
True

I'll see if I can sort out exactly what's going on.

Comment 4 Brent Eagles 2022-08-26 15:20:43 UTC
It looks like a bug in how the tooz redis driver is handling the ssl=true that is passed in as a query. For fun, I took the relevant bits out of the tooz code and did a little test:

>>> import urllib
>>> from oslo_utils import netutils
>>> from oslo_utils import strutils
>>> 
>>> parsed_url = netutils.urlsplit('redis://:1NKFZEUgvAECd2vQSq0Jxrb7u.1.151:6379/?ssl=true')
>>> print(parsed_url)
_ModifiedSplitResult(scheme='redis', netloc=':1NKFZEUgvAECd2vQSq0Jxrb7u.1.151:6379', path='/', query='ssl=true', fragment='')
>>> parsed_qs = urllib.parse.parse_qs(parsed_url.query)
>>> print(parsed_qs)
{'ssl': ['true']}
>>> 
>>> # As it appears in the tooz redis driver
... print(strutils.bool_from_string(parsed_qs['ssl']))
False
>>> 
>>> # Probably the correct code
... print(strutils.bool_from_string(parsed_qs['ssl'][0]))
True

Comment 5 Brent Eagles 2022-08-29 14:04:03 UTC
It occurred to me that I should check that urllib is behaving as expected. According to the docs it is:

from https://docs.python.org/3/library/urllib.parse.html

" urllib.parse.parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None, separator='&')

Parse a query string given as a string argument (data of type application/x-www-form-urlencoded). Data are returned as a dictionary. The dictionary keys are the unique query variable names and the values are lists of values for each name."

AFAICT the tooz redis driver has had this problem for awhile.

Comment 6 Brent Eagles 2022-08-29 14:32:43 UTC
Seems the other tooz drivers use a utility function to handle this kind of thing. I've submitted a bug u/s https://bugs.launchpad.net/python-tooz/+bug/1988059 and a patch  https://review.opendev.org/c/openstack/tooz/+/855039.

Comment 7 Brent Eagles 2022-08-29 16:31:59 UTC
This was a false lead. The tooz redis driver uses a helper function in a base class to properly pre-process the arguments so the bug isn't there.

Comment 8 Michael Johnson 2022-09-07 16:34:48 UTC
This appears to be a known issue in eventlet (linked on the BZ). Eventlet is raising the wrong exception for a socket timeout, which redis is relying on.

There is a monkey-patch the monkey-patch workaround that resolved the issue in designate central.
I will propose a patch upstream in Designate. I think fixing eventlet will take too long as OpenStack is using an older version of eventlet for other issues.

Comment 13 mjohnson 2022-11-10 00:56:12 UTC
@michjohn this was meant for you :)

Comment 21 Lilach Avraham 2022-12-19 14:59:04 UTC
We ran the Designate CI job with the changes needed to support TLS [1], and it passed the "Overcloud" stage.
The job failed at a different stage and we opened a new BZ for that[2].

I am moving this BZ status to VERIFIED.

[1] - https://rhos-ci-staging-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/network/view/openstack-designate/job/DFG-network-openstack-designate-17.0_director-rhel-virthost-3cont_2comp_1ipa-ipv4-geneve-tls/
[2] - https://bugzilla.redhat.com/show_bug.cgi?id=2154887

Comment 22 Gregory Thiemonge 2023-01-03 15:45:09 UTC
*** Bug 2154887 has been marked as a duplicate of this bug. ***

Comment 27 errata-xmlrpc 2023-01-25 12:28:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 17.0.1 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:0271


Note You need to log in before you can comment on or make changes to this bug.