Description of problem ====================== Help text of "ceph-external-cluster-details-exporter.py" script notes that RGW_ENDPOINT can be specified either as <IP>:<PORT> or <FQDN>:<PORT>, but when one tries to use FQDN, ocs-external-storagecluster-cephobjectstore fails to reconcile CephObjectStore. **Assuming that using FQDN is not actually supported**, the help text should be fixed and some validation should be introduced in the script, so that an incorrect value causes script to fail (which is much better compared to a failure during ODF StorageCluster installation later). Version-Release number of selected component ============================================ OCP 4.11.0-0.nightly-2022-05-18-171831 ODF 4.11.0-75 How reproducible ================ 100% Steps to Reproduce ================== 1. When creating StorageSystem, use "Connect an external storage platform" option and select "Red Hat Ceph Storage" 2. Download the ceph-external-cluster-details-exporter.py script 3. Run it's help: ``` $ python3 ceph-external-cluster-details-exporter.py -h ``` Actual results ============== An option for RGW endpoint is explained as: ``` --rgw-endpoint RGW_ENDPOINT RADOS Gateway endpoint (in <IP>:<PORT> format). Note: FQDN is also supported(in <FQDN>:<PORT> format) ``` When one tries to use hostname, script is ok with it, but setup of ODF ceph object store fails because of it later. Expected results ================ An option for RGW endpoint should be explained as: ``` --rgw-endpoint RGW_ENDPOINT RADOS Gateway endpoint (in <IP>:<PORT> format). Note: FQDN is not supported ``` Moreover when one tries to use hostname instead of IPv4 or IPv6 address, the script should report an error. Additional info =============== On the one hand, our guide (I'm looking at ODF 4.10[1] right now) suggest to "Provide the endpoint in the following format: <ip_address>:<port>" when an example how one can use the script is used, but on the other hand it strongly suggests one to run help of the script to learn more (the text doesn't provide all the details): > Run the following command on the RHCS node to view the list of available > arguments: > > # python3 ceph-external-cluster-details-exporter.py --help [1] https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.10/html-single/deploying_openshift_data_foundation_in_external_mode/index#deploy-openshift-data-foundation-using-red-hat-ceph-storage When one tries to use hostname for RGW endpoint, setup of ODF external storage system fails for object store: ``` ocs-external-storagecluster-cephobjectstore failed to reconcile CephObjectStore "openshift-storage/ocs-external-storagecluster-cephobjectstore". failed to create object store deployments: failed to reconcile external endpoint: failed to create or update object store "ocs-external-storagecluster-cephobjectstore" endpoint: failed to create endpoint "rook-ceph-rgw-ocs-external-storagecluster-cephobjectstore". Endpoints "rook-ceph-rgw-ocs-external-storagecluster-cephobjectstore" is invalid: [subsets[0].addresses[0].ip: Invalid value: "osd-2.mbukatov-ceph01.qe.example.com": must be a valid IP address, (e.g. 10.9.8.7 or 2001:db8::ffff), subsets[0].addresses[0].ip: Invalid value: "osd-2.mbukatov-ceph01.qe.example.com": must be a valid IP address] ```
Reported during MetroDR happy path testing.
Travis, is the "ceph-external-cluster-details-exporter.py" maintained by the ceph team or the rook team?
Talur the script is maintained by Rook. Parth Is this the fix that is already in progress with https://github.com/rook/rook/pull/10309?
Yes @tnielsen working on it, Maybe Seems like a problem even after converting it, so maybe smtg wrong in the conversion, Need to look into it, and even ig the testing of https://bugzilla.redhat.com/show_bug.cgi?id=2064426 is left by QA.
@mbukatov can I know what rgw-endpoint value you provided in fqdn format? PS: the right one should in this format `<FQDN>:<PORT>`
(In reply to Parth Arora from comment #6) > @mbukatov can I know what rgw-endpoint value you provided in fqdn > format? I used: ``` --rgw-endpoint osd-2.mbukatov-ceph01.qe.example.com:8080 ``` And as you can see in the original bug report, ODF then complained: ``` Invalid value: "osd-2.mbukatov-ceph01.qe.example.com": must be a valid IP address ```
@tnielsen I had a closer look at this, Yes this problem will be solved by the Upstream PR that is already opened https://github.com/rook/rook/pull/10309 Thanks @mbukatov for pointing it.
Please provide qa_ack
Parth, please backport the fix to 4.11
@tnielsen this would be also verified on behalf of https://bugzilla.redhat.com/show_bug.cgi?id=2064426
@vavuthu this would be also verified on behalf of https://bugzilla.redhat.com/show_bug.cgi?id=2064426
Verified with 4.11.0-96 FQDN is supported in latest builds job: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/13702/consoleFull 2022-06-17 20:15:05 14:45:05 - MainThread - ocs_ci.utility.connection - INFO - Executing cmd: python3 /tmp/external-cluster-details-exporter-n4l1715f.py --rbd-data-pool-name rbd --rgw-endpoint dc-long-xxxs3-node-2.xxx.com:8080 on 10.x.xxx.xx9 2022-06-17 20:15:06 14:45:06 - MainThread - ocs_ci.utility.connection - INFO - Executing cmd: ceph auth get client.admin on 10.x.xx1.xx9 2022-06-17 20:15:06 14:45:06 - MainThread - ocs_ci.utility.templating - INFO - apiVersion: v1 storage system status is good Status: Conditions: Last Heartbeat Time: 2022-06-20T08:50:14Z Last Transition Time: 2022-06-20T08:50:14Z Message: Reconcile is completed successfully Reason: ReconcileCompleted Status: True Type: Available Last Heartbeat Time: 2022-06-20T08:50:14Z Last Transition Time: 2022-06-20T08:50:14Z Message: Reconcile is completed successfully Reason: ReconcileCompleted Status: False Type: Progressing Last Heartbeat Time: 2022-06-20T08:50:14Z Last Transition Time: 2022-06-17T14:45:07Z Message: StorageSystem CR is valid Reason: Valid Status: False Type: StorageSystemInvalid Last Heartbeat Time: 2022-06-20T08:50:14Z Last Transition Time: 2022-06-20T08:50:14Z Reason: Ready Status: True Type: VendorCsvReady Last Heartbeat Time: 2022-06-20T08:50:14Z Last Transition Time: 2022-06-17T14:45:07Z Reason: Found Status: True Type: VendorSystemPresent > help for script --rgw-endpoint RGW_ENDPOINT RADOS Gateway endpoint (in <IP>:<PORT> format). Note: FQDN is also supported(in <FQDN>:<PORT> format)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.11.0 security, enhancement, & bugfix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:6156