Bug 2097372 - In TLS-e setup RadosGW is not configured with SSL certificate
Summary: In TLS-e setup RadosGW is not configured with SSL certificate
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: tripleo-ansible
Version: 17.0 (Wallaby)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ga
: 17.0
Assignee: Francesco Pantano
QA Contact: Alfredo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-06-15 14:24 UTC by Pavel Sedlák
Modified: 2022-09-21 12:22 UTC (History)
15 users (show)

Fixed In Version: openstack-tripleo-heat-templates-14.3.1-0.20220706080800.feca772.el9ost tripleo-ansible-3.3.1-0.20220706140824.fa5422f.el9ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-09-21 12:22:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 846930 0 None stable/wallaby: MERGED tripleo-ansible: Add ssl spec parameter to rgw (I92a35532e9a26a0f8d00660801ee8e8d5a329b8b) 2022-06-29 18:15:06 UTC
OpenStack gerrit 846940 0 None stable/wallaby: MERGED tripleo-ansible: Add ssl spec parameter to ceph_spec module (I3afa2f84a4b6204275ef66a3a5afd70add058f55) 2022-06-29 18:15:11 UTC
OpenStack gerrit 847258 0 None stable/wallaby: MERGED tripleo-heat-templates: Fix rgw ssl_verify option key (I9cb94d68699b5c3a62c82558b51c5ace7ea1ac15) 2022-06-29 18:15:16 UTC
Red Hat Issue Tracker OSP-15769 0 None None None 2022-06-15 14:41:39 UTC
Red Hat Product Errata RHEA-2022:6543 0 None None None 2022-09-21 12:22:52 UTC

Description Pavel Sedlák 2022-06-15 14:24:46 UTC
In tls-everywhere setup, object storage tests are failing due to haproxy failing to connect to Ceph RadosGW.

In controller-2/var/log/containers/haproxy/haproxy.log can be seen:
> Jun 14 18:28:11 controller-1 haproxy[7]: Server ceph_rgw/controller-1.storage.redhat.local is DOWN, reason: Layer6 invalid response, info: "SSL handshake failure", check duration: 2ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
> Jun 14 18:28:11 controller-1 haproxy[7]: Server ceph_rgw/controller-0.storage.redhat.local is DOWN, reason: Layer6 invalid response, info: "SSL handshake failure", check duration: 2ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
> Jun 14 18:28:11 controller-1 haproxy[7]: Server ceph_rgw/controller-2.storage.redhat.local is DOWN, reason: Layer6 invalid response, info: "SSL handshake failure", check duration: 1ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
> Jun 14 18:28:11 controller-1 haproxy[7]: proxy ceph_rgw has no server available!
> Jun 14 18:50:54 controller-1 haproxy[7]: 10.0.0.99:58596 [14/Jun/2022:18:50:54.455] ceph_rgw~ ceph_rgw/<NOSRV> 0/{-}1/-1/-1/0 503 217 - - SC{-}- 1/1/0/0/0 0/0 "GET /info HTTP/1.1"

and that is all about ceph_rgw there (only more NOSRV entries), it never gets marked as UP.

Due to that, any attempts to use object-storage fails like e.g. in tempest case:
> testtools.testresult.real._StringException: pythonlogging:'': {{{
> 2022-06-14 19:08:38,830 193543 INFO     [tempest.lib.common.rest_client] Request (TestObjectStorageBasicOps:test_swift_basic_ops): 503 GET https://overcloud.redhat.local:13808/swift/v1/AUTH_d29b1bd5121f463e8da63823a5b39615 0.136s
> 2022-06-14 19:08:38,830 193543 DEBUG    [tempest.lib.common.rest_client] Request - Headers: {'X-Auth-Token': '<omitted>'}
>         Body: None
>     Response - Headers: {'content-length': '107', 'cache-control': 'no-cache', 'content-type': 'text/html', 'connection': 'close', 'status': '503', 'content-location': 'https://overcloud.redhat.local:13808/swift/v1/AUTH_d29b1bd5121f463e8da63823a5b39615'}
>         Body: b'<html><body><h1>503 Service Unavailable</h1>\nNo server is available to handle this request.\n</body></html>\n'
> }}}
>
> Traceback (most recent call last):
>   File "/usr/lib/python3.9/site-packages/tempest/common/utils/__init__.py", line 70, in wrapper
>     return f(*func_args, **func_kwargs)
>   File "/usr/lib/python3.9/site-packages/tempest/scenario/test_object_storage_basic_ops.py", line 37, in test_swift_basic_ops
>     self.get_swift_stat()
>   File "/usr/lib/python3.9/site-packages/tempest/scenario/manager.py", line 1628, in get_swift_stat
>     self.account_client.list_account_containers()
>   File "/usr/lib/python3.9/site-packages/tempest/lib/services/object_storage/account_client.py", line 70, in list_account_containers
>     resp, body = self.get(url, headers={})
>   File "/usr/lib/python3.9/site-packages/tempest/lib/common/rest_client.py", line 314, in get
>     return self.request('GET', url, extra_headers, headers)
>   File "/usr/lib/python3.9/site-packages/tempest/lib/common/rest_client.py", line 703, in request
>     self._error_checker(resp, resp_body)
>   File "/usr/lib/python3.9/site-packages/tempest/lib/common/rest_client.py", line 883, in _error_checker
>     raise exceptions.UnexpectedResponseCode(str(resp.status),
> tempest.lib.exceptions.UnexpectedResponseCode: Unexpected response code received
> Details: 503


I'm not exactly sure what is the root cause behing it, but seems that haproxy is configured to connect to radosgw via https/ssl while rgw is listening on plain http:

1) haproxy rgw backends configured
> [root@controller-1 heat-admin]# grep 'listen |^ server ' /var/lib/config-data/haproxy/etc/haproxy/haproxy.cfg | head -n 4
> listen ceph_rgw
> server controller-0.storage.redhat.local 172.17.3.23:8080 ca-file /etc/ipa/ca.crt check fall 5 inter 2000 rise 2 ssl verify required verifyhost controller-0.storage.redhat.local
> server controller-1.storage.redhat.local 172.17.3.99:8080 ca-file /etc/ipa/ca.crt check fall 5 inter 2000 rise 2 ssl verify required verifyhost controller-1.storage.redhat.local
> server controller-2.storage.redhat.local 172.17.3.135:8080 ca-file /etc/ipa/ca.crt check fall 5 inter 2000 rise 2 ssl verify required verifyhost controller-2.storage.redhat.local
2) trying on over httpS fails
> [root@controller-1 heat-admin]# curl 'https://172.17.3.99:8080/'
> curl: (35) error:0A00010B:SSL routines::wrong version number
3) trying it over plain http works
> [root@controller-1 heat-admin]# curl http://172.17.3.99:8080/
> <?xml version="1.0" encoding="UTF-8"?><ListAllMyBucketsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Owner><ID>anonymous</ID><DisplayName></DisplayName></Owner><Buckets></Buckets></ListAllMyBucketsResult>

According to ceph orch info the certificate is expected to be used:
> service_type: rgw
> service_id: rgw
> service_name: rgw.rgw
> placement:
>   hosts:
>   - controller-0
>   - controller-1
>   - controller-2
> networks:
> - 172.17.3.0/24
> spec:
>   rgw_frontend_port: 8080
>   rgw_frontend_ssl_certificate: '-----BEGIN CERTIFICATE-----
>     MIIFdTCCA92gAwIBAgIBDTANBgkqhkiG9w0BAQsFADA3MRUwEwYDVQQKDAxSRURI
>     ...

Comment 6 Pavel Sedlák 2022-06-16 10:06:50 UTC
Forgot to mention which versions are involved, here is a quick list:

rpms on overcloud (controller|ceph):
> cephadm-16.2.7-121.el9cp.noarch
> puppet-haproxy-4.2.2-0.20210812210050.a797b8c.el9ost.noarch
> certmonger-0.79.14-5.el9.x86_64
> puppet-certmonger-2.7.1-0.20210812224230.3e2e660.el9ost.noarch

containers on overcloud:
> # from podman ps
> rh-osbs/rhceph:5-170
> rh-osbs/rhceph@sha256:90e4316d65f4a76fea307705d9b0e4706f05e10a63bf041dbee379c8711db115
> # podman images
> undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph                              5-170            9ea8ac4eae90  2 months ago  1.05 GB

rpms on undercloud:
> certmonger-0.79.14-5.el9.x86_64
> openstack-tripleo-heat-templates-14.3.1-0.20220607161058.ced328c.el9ost.noarch
> tripleo-ansible-3.3.1-0.20220607162207.ae139c3.el9ost.noarch
> ansible-core-2.12.2-1.el9.x86_64
> ansible-collection-ansible-posix-1.2.0-1.3.el9ost.noarch
> ansible-collection-community-general-4.0.0-1.1.el9ost.noarch
> ansible-collection-containers-podman-1.9.3-1.el9ost.noarch
> ansible-role-container-registry-1.4.1-0.20220506220849.57da845.el9ost.noarch
> ansible-role-redhat-subscription-1.2.1-0.20220529221557.ef52a27.el9ost.noarch
> ansible-tripleo-ipsec-11.0.1-0.20210910011424.b5559c8.el9ost.noarch
> ansible-collection-ansible-utils-2.3.0-2.el9ost.noarch
> ansible-collection-ansible-netcommon-2.2.0-1.2.el9ost.noarch
> ansible-config_template-1.2.2-0.20220427223824.78e7f22.el9ost.noarch
> ansible-role-atos-hsm-1.0.1-0.20210908111811.ccd3896.el9ost.noarch
> ansible-role-chrony-1.2.1-0.20220607160358.7ccf873.el9ost.noarch
> ansible-role-collectd-config-0.0.2-0.20220204170819.1992666.el9ost.noarch
> ansible-role-lunasa-hsm-1.1.1-0.20210908110336.6ebc8f4.el9ost.noarch
> ansible-role-qdr-config-0.0.1-0.20210908110336.b456651.el9ost.noarch
> ansible-role-thales-hsm-1.0.1-0.20210908120803.e0f4569.el9ost.noarch
> ansible-freeipa-1.6.3-1.el9.noarch
> ansible-tripleo-ipa-0.2.3-0.20220301190449.6b0ed82.el9ost.noarch
> ansible-role-tripleo-modify-image-1.3.1-0.20220216001439.30d23d5.el9ost.noarch
> ansible-collections-openstack-1.8.0-0.20220513060934.5bb8312.el9ost.noarch
> ansible-pacemaker-1.0.4-0.20210910010919.666f706.el9ost.noarch
> ansible-role-openstack-operations-0.0.1-0.20210915011315.2ab288f.el9ost.noarch
> ansible-role-metalsmith-deployment-1.4.3-0.20220223021106.324b758.el9ost.noarch

Comment 20 errata-xmlrpc 2022-09-21 12:22:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2022:6543


Note You need to log in before you can comment on or make changes to this bug.