In tls-everywhere setup, object storage tests are failing due to haproxy failing to connect to Ceph RadosGW. In controller-2/var/log/containers/haproxy/haproxy.log can be seen: > Jun 14 18:28:11 controller-1 haproxy[7]: Server ceph_rgw/controller-1.storage.redhat.local is DOWN, reason: Layer6 invalid response, info: "SSL handshake failure", check duration: 2ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue. > Jun 14 18:28:11 controller-1 haproxy[7]: Server ceph_rgw/controller-0.storage.redhat.local is DOWN, reason: Layer6 invalid response, info: "SSL handshake failure", check duration: 2ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue. > Jun 14 18:28:11 controller-1 haproxy[7]: Server ceph_rgw/controller-2.storage.redhat.local is DOWN, reason: Layer6 invalid response, info: "SSL handshake failure", check duration: 1ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue. > Jun 14 18:28:11 controller-1 haproxy[7]: proxy ceph_rgw has no server available! > Jun 14 18:50:54 controller-1 haproxy[7]: 10.0.0.99:58596 [14/Jun/2022:18:50:54.455] ceph_rgw~ ceph_rgw/<NOSRV> 0/{-}1/-1/-1/0 503 217 - - SC{-}- 1/1/0/0/0 0/0 "GET /info HTTP/1.1" and that is all about ceph_rgw there (only more NOSRV entries), it never gets marked as UP. Due to that, any attempts to use object-storage fails like e.g. in tempest case: > testtools.testresult.real._StringException: pythonlogging:'': {{{ > 2022-06-14 19:08:38,830 193543 INFO [tempest.lib.common.rest_client] Request (TestObjectStorageBasicOps:test_swift_basic_ops): 503 GET https://overcloud.redhat.local:13808/swift/v1/AUTH_d29b1bd5121f463e8da63823a5b39615 0.136s > 2022-06-14 19:08:38,830 193543 DEBUG [tempest.lib.common.rest_client] Request - Headers: {'X-Auth-Token': '<omitted>'} > Body: None > Response - Headers: {'content-length': '107', 'cache-control': 'no-cache', 'content-type': 'text/html', 'connection': 'close', 'status': '503', 'content-location': 'https://overcloud.redhat.local:13808/swift/v1/AUTH_d29b1bd5121f463e8da63823a5b39615'} > Body: b'<html><body><h1>503 Service Unavailable</h1>\nNo server is available to handle this request.\n</body></html>\n' > }}} > > Traceback (most recent call last): > File "/usr/lib/python3.9/site-packages/tempest/common/utils/__init__.py", line 70, in wrapper > return f(*func_args, **func_kwargs) > File "/usr/lib/python3.9/site-packages/tempest/scenario/test_object_storage_basic_ops.py", line 37, in test_swift_basic_ops > self.get_swift_stat() > File "/usr/lib/python3.9/site-packages/tempest/scenario/manager.py", line 1628, in get_swift_stat > self.account_client.list_account_containers() > File "/usr/lib/python3.9/site-packages/tempest/lib/services/object_storage/account_client.py", line 70, in list_account_containers > resp, body = self.get(url, headers={}) > File "/usr/lib/python3.9/site-packages/tempest/lib/common/rest_client.py", line 314, in get > return self.request('GET', url, extra_headers, headers) > File "/usr/lib/python3.9/site-packages/tempest/lib/common/rest_client.py", line 703, in request > self._error_checker(resp, resp_body) > File "/usr/lib/python3.9/site-packages/tempest/lib/common/rest_client.py", line 883, in _error_checker > raise exceptions.UnexpectedResponseCode(str(resp.status), > tempest.lib.exceptions.UnexpectedResponseCode: Unexpected response code received > Details: 503 I'm not exactly sure what is the root cause behing it, but seems that haproxy is configured to connect to radosgw via https/ssl while rgw is listening on plain http: 1) haproxy rgw backends configured > [root@controller-1 heat-admin]# grep 'listen |^ server ' /var/lib/config-data/haproxy/etc/haproxy/haproxy.cfg | head -n 4 > listen ceph_rgw > server controller-0.storage.redhat.local 172.17.3.23:8080 ca-file /etc/ipa/ca.crt check fall 5 inter 2000 rise 2 ssl verify required verifyhost controller-0.storage.redhat.local > server controller-1.storage.redhat.local 172.17.3.99:8080 ca-file /etc/ipa/ca.crt check fall 5 inter 2000 rise 2 ssl verify required verifyhost controller-1.storage.redhat.local > server controller-2.storage.redhat.local 172.17.3.135:8080 ca-file /etc/ipa/ca.crt check fall 5 inter 2000 rise 2 ssl verify required verifyhost controller-2.storage.redhat.local 2) trying on over httpS fails > [root@controller-1 heat-admin]# curl 'https://172.17.3.99:8080/' > curl: (35) error:0A00010B:SSL routines::wrong version number 3) trying it over plain http works > [root@controller-1 heat-admin]# curl http://172.17.3.99:8080/ > <?xml version="1.0" encoding="UTF-8"?><ListAllMyBucketsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Owner><ID>anonymous</ID><DisplayName></DisplayName></Owner><Buckets></Buckets></ListAllMyBucketsResult> According to ceph orch info the certificate is expected to be used: > service_type: rgw > service_id: rgw > service_name: rgw.rgw > placement: > hosts: > - controller-0 > - controller-1 > - controller-2 > networks: > - 172.17.3.0/24 > spec: > rgw_frontend_port: 8080 > rgw_frontend_ssl_certificate: '-----BEGIN CERTIFICATE----- > MIIFdTCCA92gAwIBAgIBDTANBgkqhkiG9w0BAQsFADA3MRUwEwYDVQQKDAxSRURI > ...
Forgot to mention which versions are involved, here is a quick list: rpms on overcloud (controller|ceph): > cephadm-16.2.7-121.el9cp.noarch > puppet-haproxy-4.2.2-0.20210812210050.a797b8c.el9ost.noarch > certmonger-0.79.14-5.el9.x86_64 > puppet-certmonger-2.7.1-0.20210812224230.3e2e660.el9ost.noarch containers on overcloud: > # from podman ps > rh-osbs/rhceph:5-170 > rh-osbs/rhceph@sha256:90e4316d65f4a76fea307705d9b0e4706f05e10a63bf041dbee379c8711db115 > # podman images > undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph 5-170 9ea8ac4eae90 2 months ago 1.05 GB rpms on undercloud: > certmonger-0.79.14-5.el9.x86_64 > openstack-tripleo-heat-templates-14.3.1-0.20220607161058.ced328c.el9ost.noarch > tripleo-ansible-3.3.1-0.20220607162207.ae139c3.el9ost.noarch > ansible-core-2.12.2-1.el9.x86_64 > ansible-collection-ansible-posix-1.2.0-1.3.el9ost.noarch > ansible-collection-community-general-4.0.0-1.1.el9ost.noarch > ansible-collection-containers-podman-1.9.3-1.el9ost.noarch > ansible-role-container-registry-1.4.1-0.20220506220849.57da845.el9ost.noarch > ansible-role-redhat-subscription-1.2.1-0.20220529221557.ef52a27.el9ost.noarch > ansible-tripleo-ipsec-11.0.1-0.20210910011424.b5559c8.el9ost.noarch > ansible-collection-ansible-utils-2.3.0-2.el9ost.noarch > ansible-collection-ansible-netcommon-2.2.0-1.2.el9ost.noarch > ansible-config_template-1.2.2-0.20220427223824.78e7f22.el9ost.noarch > ansible-role-atos-hsm-1.0.1-0.20210908111811.ccd3896.el9ost.noarch > ansible-role-chrony-1.2.1-0.20220607160358.7ccf873.el9ost.noarch > ansible-role-collectd-config-0.0.2-0.20220204170819.1992666.el9ost.noarch > ansible-role-lunasa-hsm-1.1.1-0.20210908110336.6ebc8f4.el9ost.noarch > ansible-role-qdr-config-0.0.1-0.20210908110336.b456651.el9ost.noarch > ansible-role-thales-hsm-1.0.1-0.20210908120803.e0f4569.el9ost.noarch > ansible-freeipa-1.6.3-1.el9.noarch > ansible-tripleo-ipa-0.2.3-0.20220301190449.6b0ed82.el9ost.noarch > ansible-role-tripleo-modify-image-1.3.1-0.20220216001439.30d23d5.el9ost.noarch > ansible-collections-openstack-1.8.0-0.20220513060934.5bb8312.el9ost.noarch > ansible-pacemaker-1.0.4-0.20210910010919.666f706.el9ost.noarch > ansible-role-openstack-operations-0.0.1-0.20210915011315.2ab288f.el9ost.noarch > ansible-role-metalsmith-deployment-1.4.3-0.20220223021106.324b758.el9ost.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2022:6543