Bug 2088506 - ceph-external-cluster-details-exporter.py should not accept hostname for rgw-endpoint
Summary: ceph-external-cluster-details-exporter.py should not accept hostname for rgw-...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: rook
Version: 4.11
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ODF 4.11.0
Assignee: Parth Arora
QA Contact: Vijay Avuthu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-05-19 15:15 UTC by Martin Bukatovic
Modified: 2023-08-09 17:03 UTC (History)
10 users (show)

Fixed In Version: 4.11.0-96
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-24 13:53:39 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage rook pull 385 0 None open Bug 2088506: rgw: make fqdn rgw converted to ip everywhere 2022-06-07 09:33:18 UTC
Github rook rook pull 10309 0 None open rgw: make fqdn rgw converted to ip everywhere 2022-05-31 13:42:22 UTC
Red Hat Product Errata RHSA-2022:6156 0 None None None 2022-08-24 13:53:49 UTC

Description Martin Bukatovic 2022-05-19 15:15:48 UTC
Description of problem
======================

Help text of "ceph-external-cluster-details-exporter.py" script notes that
RGW_ENDPOINT can be specified either as <IP>:<PORT> or <FQDN>:<PORT>, but
when one tries to use FQDN, ocs-external-storagecluster-cephobjectstore fails
to reconcile CephObjectStore.

**Assuming that using FQDN is not actually supported**, the help text should be
fixed and some validation should be introduced in the script, so that an
incorrect value causes script to fail (which is much better compared to a
failure during ODF StorageCluster installation later).

Version-Release number of selected component
============================================

OCP 4.11.0-0.nightly-2022-05-18-171831
ODF 4.11.0-75

How reproducible
================

100%

Steps to Reproduce
==================

1. When creating StorageSystem, use "Connect an external storage platform"
   option and select "Red Hat Ceph Storage"
2. Download the ceph-external-cluster-details-exporter.py script
3. Run it's help:

```
$ python3 ceph-external-cluster-details-exporter.py -h
```

Actual results
==============

An option for RGW endpoint is explained as:

```
  --rgw-endpoint RGW_ENDPOINT
                        RADOS Gateway endpoint (in <IP>:<PORT> format). Note:
                        FQDN is also supported(in <FQDN>:<PORT> format)
```

When one tries to use hostname, script is ok with it, but setup of ODF ceph
object store fails because of it later.

Expected results
================

An option for RGW endpoint should be explained as:

```
  --rgw-endpoint RGW_ENDPOINT
                        RADOS Gateway endpoint (in <IP>:<PORT> format). Note:
                        FQDN is not supported
```

Moreover when one tries to use hostname instead of IPv4 or IPv6 address, the
script should report an error.

Additional info
===============

On the one hand, our guide (I'm looking at ODF 4.10[1] right now) suggest to
"Provide the endpoint in the following format: <ip_address>:<port>" when an
example how one can use the script is used, but on the other hand it strongly
suggests one to run help of the script to learn more (the text doesn't
provide all the details):

> Run the following command on the RHCS node to view the list of available
> arguments: 
>
> # python3 ceph-external-cluster-details-exporter.py --help

[1] https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.10/html-single/deploying_openshift_data_foundation_in_external_mode/index#deploy-openshift-data-foundation-using-red-hat-ceph-storage

When one tries to use hostname for RGW endpoint, setup of ODF external storage
system fails for object store:

```
ocs-external-storagecluster-cephobjectstore
failed to reconcile CephObjectStore "openshift-storage/ocs-external-storagecluster-cephobjectstore". failed to create object store deployments: failed to reconcile external endpoint: failed to create or update object store "ocs-external-storagecluster-cephobjectstore" endpoint: failed to create endpoint "rook-ceph-rgw-ocs-external-storagecluster-cephobjectstore". Endpoints "rook-ceph-rgw-ocs-external-storagecluster-cephobjectstore" is invalid: [subsets[0].addresses[0].ip: Invalid value: "osd-2.mbukatov-ceph01.qe.example.com": must be a valid IP address, (e.g. 10.9.8.7 or 2001:db8::ffff), subsets[0].addresses[0].ip: Invalid value: "osd-2.mbukatov-ceph01.qe.example.com": must be a valid IP address]
```

Comment 2 Martin Bukatovic 2022-05-19 15:17:33 UTC
Reported during MetroDR happy path testing.

Comment 3 Raghavendra Talur 2022-05-24 14:36:15 UTC
Travis, is the "ceph-external-cluster-details-exporter.py" maintained by the ceph team or the rook team?

Comment 4 Travis Nielsen 2022-05-24 18:28:19 UTC
Talur the script is maintained by Rook.
Parth Is this the fix that is already in progress with https://github.com/rook/rook/pull/10309?

Comment 5 Parth Arora 2022-05-25 15:30:08 UTC
Yes @tnielsen working on it,

Maybe Seems like a problem even after converting it, so maybe smtg wrong in the conversion, 
Need to look into it, and even ig the testing of https://bugzilla.redhat.com/show_bug.cgi?id=2064426 is left by QA.

Comment 6 Parth Arora 2022-05-26 15:23:08 UTC
@mbukatov can I know what rgw-endpoint value you provided in fqdn format?


PS: the right one should in this format `<FQDN>:<PORT>`

Comment 7 Martin Bukatovic 2022-05-26 18:57:31 UTC
(In reply to Parth Arora from comment #6)
> @mbukatov can I know what rgw-endpoint value you provided in fqdn
> format?

I used:

```
--rgw-endpoint osd-2.mbukatov-ceph01.qe.example.com:8080
```

And as you can see in the original bug report, ODF then complained:

```
Invalid value: "osd-2.mbukatov-ceph01.qe.example.com": must be a valid IP address
```

Comment 8 Parth Arora 2022-05-31 10:13:17 UTC
 @tnielsen I had a closer look at this,
Yes this problem will be solved by the Upstream PR that is already opened https://github.com/rook/rook/pull/10309

Thanks @mbukatov for pointing it.

Comment 9 Mudit Agarwal 2022-05-31 13:42:22 UTC
Please provide qa_ack

Comment 12 Mudit Agarwal 2022-06-07 09:12:14 UTC
Parth, please backport the fix to 4.11

Comment 14 Parth Arora 2022-06-13 13:25:32 UTC
@tnielsen this would be also verified on behalf of https://bugzilla.redhat.com/show_bug.cgi?id=2064426

Comment 15 Parth Arora 2022-06-17 19:26:23 UTC
@vavuthu this would be also verified on behalf of https://bugzilla.redhat.com/show_bug.cgi?id=2064426

Comment 16 Vijay Avuthu 2022-06-20 09:10:55 UTC
Verified with 4.11.0-96

FQDN is supported in latest builds

job: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/13702/consoleFull

2022-06-17 20:15:05  14:45:05 - MainThread - ocs_ci.utility.connection - INFO  - Executing cmd: python3 /tmp/external-cluster-details-exporter-n4l1715f.py --rbd-data-pool-name rbd --rgw-endpoint dc-long-xxxs3-node-2.xxx.com:8080 on 10.x.xxx.xx9
2022-06-17 20:15:06  14:45:06 - MainThread - ocs_ci.utility.connection - INFO  - Executing cmd: ceph auth get client.admin on 10.x.xx1.xx9
2022-06-17 20:15:06  14:45:06 - MainThread - ocs_ci.utility.templating - INFO  - apiVersion: v1

storage system status is good

Status:
  Conditions:
    Last Heartbeat Time:   2022-06-20T08:50:14Z
    Last Transition Time:  2022-06-20T08:50:14Z
    Message:               Reconcile is completed successfully
    Reason:                ReconcileCompleted
    Status:                True
    Type:                  Available
    Last Heartbeat Time:   2022-06-20T08:50:14Z
    Last Transition Time:  2022-06-20T08:50:14Z
    Message:               Reconcile is completed successfully
    Reason:                ReconcileCompleted
    Status:                False
    Type:                  Progressing
    Last Heartbeat Time:   2022-06-20T08:50:14Z
    Last Transition Time:  2022-06-17T14:45:07Z
    Message:               StorageSystem CR is valid
    Reason:                Valid
    Status:                False
    Type:                  StorageSystemInvalid
    Last Heartbeat Time:   2022-06-20T08:50:14Z
    Last Transition Time:  2022-06-20T08:50:14Z
    Reason:                Ready
    Status:                True
    Type:                  VendorCsvReady
    Last Heartbeat Time:   2022-06-20T08:50:14Z
    Last Transition Time:  2022-06-17T14:45:07Z
    Reason:                Found
    Status:                True
    Type:                  VendorSystemPresent

> help for script

  --rgw-endpoint RGW_ENDPOINT
                        RADOS Gateway endpoint (in <IP>:<PORT> format). Note:
                        FQDN is also supported(in <FQDN>:<PORT> format)

Comment 19 errata-xmlrpc 2022-08-24 13:53:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.11.0 security, enhancement, & bugfix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:6156


Note You need to log in before you can comment on or make changes to this bug.