Description of problem (please be detailed as possible and provide log snippests): While deploying OpenShift Container Storage 4.5 in external mode and RGW endpoint is TLS enabled (or behind TLS enabled HAproxy as suggested in documentation [1]), the script ceph-external-cluster-details-exporter.py fails. [1] https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html-single/object_gateway_configuration_and_administration_guide/index#rgw-configuring-ha-proxy-keepalived-rgw Version of all relevant components (if applicable): $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.7 True False 5d12h Cluster version is 4.5.7 $ oc get csv -n openshift-storage NAME DISPLAY VERSION REPLACES PHASE elasticsearch-operator.4.5.0-202008100413.p0 Elasticsearch Operator 4.5.0-202008100413.p0 Succeeded ocs-operator.v4.5.0-545.ci OpenShift Container Storage 4.5.0-545.ci Succeeded Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? To continue with the deployment an unsecured RGW endpoint has to be configured which has information security implications. Is there any workaround available to the best of your knowledge? No Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 2 Can this issue reproducible? Yes Can this issue reproduce from the UI? N/A If this is a regression, please provide more details to justify this: N/A Steps to Reproduce: 1. Have a running OCP 4.5 2. Have a running Ceph 4.1 with TLS-enabled RGW 3. Enable ocs catalogsource for the latest OCS 4.5 build 4. Install operator 5. Create OCS Cluster Service in Independent Mode 6. Attempt to collect the cluster details with ceph-external-cluster-details-exporter.py Actual results: Connection fails Expected results: The connection to TLS enabled RGW should succeed Additional info: $ python3 ceph-external-cluster-details-exporter.py --rgw-endpoint 192.168.122.10:443 --rbd-data-pool-name rbd_k8s_openshift Excecution Failed: failed to connect to rgw endpoint http://192.168.122.10:443 The endpoint is reachable, though. Also running the same test with an unencrypted HTTP RGW endpoint works as expected.
Arun PTAL. Will be in the next z-stream
Ack. Looking into it..
@Ashish, I remember you tried RGW with HA-Proxy enabled. Did you also face the same issue ?
Raised PR: https://github.com/rook/rook/pull/6227 @Sebastian, please take a look.
@seb, in order to move to another release, you do not only need to add the new release flag, but also remove the old one. :-)
Hey Neha, No, I did not try the RGW behind HAProxy, we just discussed that if the Customer wants to use multiple RGW for load balancing, he should use HAProxy. Regards, Ashish Singh
Moving it back to POST because the fix was reverted. Also, please do not merge it in Downstream 4.5 till we have all the acks and downstream is open for check-ins.
Hi Mudit, Is there a particular reason for backporting the fix in 4.5.z. If it is urgent and must-fix for 4.5.z, are we planning it for 4.5.2, since 4.5.1 is a minimal release ?
We will backport it in 4.5.2 only, there is no plan to put this in 4.5.1 The confusion here is because Seb merged it into 4.5 branch by mistake but later reverted the same, so the fix is not there in 4.5.z and it won't be there till we start accepting the changes for 4.5.2 https://github.com/openshift/rook/pull/122/ is actually a revert PR.
For now, not qa acking. Will do once we will start planning the 2nd 4.5 z-stream
This BZ doesn't have any acks and as of now there are no plans for next 4.5.z, is there any reason to merge https://github.com/openshift/rook/pull/148 to 4.5?
tbh I have no idea, Arun did the patch and requested review only, so I assumed everything was in place BZ wise...
Can we get a QE ack for it, if we do feel it's going to be in a 4.5.z? Or close...
Yaniv, We are not planning 4.5.z so this needs to be retargeted to 4.6.z.
I think the fix is already there in 4.6, Arun can you please confirm the same?
> I think the fix is already there in 4.6, Arun can you please confirm the same? Yes it is already backported to `openshift/rook` release-4.6 branch... https://github.com/openshift/rook/commit/1b17580f39e76faf4fd8c5ab89cd38ab7cb917fd File#line: https://github.com/openshift/rook/blob/release-4.6/cluster/examples/kubernetes/ceph/create-external-cluster-resources.py#L226