Bug 1876852 - [External Mode] ceph-external-cluster-details-exporter.py does not tolerate TLS enabled RGW
Summary: [External Mode] ceph-external-cluster-details-exporter.py does not tolerate ...
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: rook
Version: 4.5
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: arun kumar mohan
QA Contact: Rachael
URL:
Whiteboard:
Depends On:
Blocks: 1878853
TreeView+ depends on / blocked
 
Reported: 2020-09-08 11:01 UTC by Mauro Oddi
Modified: 2021-04-30 16:45 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1878853 (view as bug list)
Environment:
Last Closed: 2020-12-16 08:45:24 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift rook pull 122 0 None closed Revert "Bug 1876852: ceph: fix to support TLS enabled rgw-endpoint" 2021-02-11 07:55:08 UTC
Github rook rook pull 6227 0 None closed ceph: fix to support TLS enabled rgw-endpoint 2021-02-11 07:55:08 UTC

Description Mauro Oddi 2020-09-08 11:01:45 UTC
Description of problem (please be detailed as possible and provide log
snippests):

While deploying OpenShift Container Storage 4.5 in external mode and RGW endpoint is TLS enabled (or behind TLS enabled HAproxy as suggested in documentation [1]), the script ceph-external-cluster-details-exporter.py fails.


[1] https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html-single/object_gateway_configuration_and_administration_guide/index#rgw-configuring-ha-proxy-keepalived-rgw


Version of all relevant components (if applicable):


$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.7     True        False         5d12h   Cluster version is 4.5.7


$ oc get csv -n openshift-storage
NAME                                           DISPLAY                       VERSION                 REPLACES   PHASE
elasticsearch-operator.4.5.0-202008100413.p0   Elasticsearch Operator        4.5.0-202008100413.p0              Succeeded
ocs-operator.v4.5.0-545.ci                     OpenShift Container Storage   4.5.0-545.ci                       Succeeded


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

To continue with the deployment an unsecured RGW endpoint has to be configured which has information security implications.

Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
N/A

If this is a regression, please provide more details to justify this:
N/A

Steps to Reproduce:
1. Have a running OCP 4.5
2. Have a running Ceph 4.1 with TLS-enabled RGW
3. Enable ocs catalogsource for the latest OCS 4.5 build
4. Install operator
5. Create OCS Cluster Service in Independent Mode
6. Attempt to collect the cluster details with ceph-external-cluster-details-exporter.py 


Actual results:
Connection fails

Expected results:
The connection to TLS enabled RGW should succeed


Additional info:

$ python3 ceph-external-cluster-details-exporter.py --rgw-endpoint 192.168.122.10:443 --rbd-data-pool-name rbd_k8s_openshift
Excecution Failed: failed to connect to rgw endpoint http://192.168.122.10:443

The endpoint is reachable, though.

Also running the same test with an unencrypted HTTP RGW endpoint works as expected.

Comment 2 Sébastien Han 2020-09-08 17:20:20 UTC
Arun PTAL. Will be in the next z-stream

Comment 3 arun kumar mohan 2020-09-09 06:36:16 UTC
Ack. Looking into it..

Comment 4 Neha Berry 2020-09-10 06:40:12 UTC
@Ashish, I remember you tried RGW with HA-Proxy enabled. Did you also face the same issue ?

Comment 5 arun kumar mohan 2020-09-10 08:49:28 UTC
Raised PR: https://github.com/rook/rook/pull/6227
@Sebastian, please take a look.

Comment 6 Michael Adam 2020-09-10 09:29:55 UTC
@seb, in order to move to another release, you do not only need to add the new release flag, but also remove the old one. :-)

Comment 7 Ashish Singh 2020-09-10 09:30:58 UTC
Hey Neha,

No, I did not try the RGW behind HAProxy, we just discussed that if the Customer wants to use multiple RGW for load balancing, he should use HAProxy.

Regards,
Ashish Singh

Comment 8 Mudit Agarwal 2020-09-17 11:09:54 UTC
Moving it back to POST because the fix was reverted.

Also, please do not merge it in Downstream 4.5 till we have all the acks and downstream is open for check-ins.

Comment 9 Neha Berry 2020-09-29 11:01:07 UTC
Hi Mudit,

Is there a particular reason for backporting the fix in 4.5.z. If it is urgent and must-fix for 4.5.z, are we planning it for 4.5.2, since 4.5.1 is a minimal release ?

Comment 10 Mudit Agarwal 2020-09-29 12:31:12 UTC
We will backport it in 4.5.2 only, there is no plan to put this in 4.5.1

The confusion here is because Seb merged it into 4.5 branch by mistake but later reverted the same, so the fix is not there in 4.5.z and it won't be there till we start accepting the changes for 4.5.2

https://github.com/openshift/rook/pull/122/ is actually a revert PR.

Comment 11 Elad 2020-10-06 10:19:12 UTC
For now, not qa acking. Will do once we will start planning the 2nd 4.5 z-stream

Comment 12 Mudit Agarwal 2020-11-12 08:54:22 UTC
This BZ doesn't have any acks and as of now there are no plans for next 4.5.z, is there any reason to merge https://github.com/openshift/rook/pull/148 to 4.5?

Comment 13 Sébastien Han 2020-11-12 09:03:05 UTC
tbh I have no idea, Arun did the patch and requested review only, so I assumed everything was in place BZ wise...

Comment 14 Yaniv Kaul 2020-12-14 07:04:58 UTC
Can we get a QE ack for it, if we do feel it's going to be in a 4.5.z? Or close...

Comment 15 Raz Tamir 2020-12-14 08:59:04 UTC
Yaniv,

We are not planning 4.5.z so this needs to be retargeted to 4.6.z.

Comment 16 Mudit Agarwal 2020-12-15 05:07:15 UTC
I think the fix is already there in 4.6, Arun can you please confirm the same?

Comment 17 arun kumar mohan 2020-12-16 08:09:17 UTC
> I think the fix is already there in 4.6, Arun can you please confirm the same?

Yes it is already backported to `openshift/rook` release-4.6 branch...
https://github.com/openshift/rook/commit/1b17580f39e76faf4fd8c5ab89cd38ab7cb917fd

File#line: https://github.com/openshift/rook/blob/release-4.6/cluster/examples/kubernetes/ceph/create-external-cluster-resources.py#L226


Note You need to log in before you can comment on or make changes to this bug.