Bug 1850704 - Independent Mode: The ceph-external-cluster-details-exporter.py generates the json output even with incorrect rgw-endpoint IP address
Summary: Independent Mode: The ceph-external-cluster-details-exporter.py generates the...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: rook
Version: 4.5
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: OCS 4.5.0
Assignee: Sébastien Han
QA Contact: Rachael
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-24 18:29 UTC by Neha Berry
Modified: 2020-10-26 06:01 UTC (History)
7 users (show)

Fixed In Version: 4.5.0-508.ci
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1862405 (view as bug list)
Environment:
Last Closed: 2020-09-15 10:17:53 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github rook rook pull 5833 0 None closed ceph: add http dial to verify external endpoint 2021-01-12 07:41:16 UTC
Red Hat Product Errata RHBA-2020:3754 0 None None None 2020-09-15 10:18:21 UTC

Description Neha Berry 2020-06-24 18:29:49 UTC
Description of problem (please be detailed as possible and provide log
snippests):
----------------------------------------------------------------------
As part of Independent Mode install in OCP UI, one gets the option to download the ceph-external-cluster-details-exporter.py and run on external RHCS cluster to collect the config details.

Command:  python3 ceph-external-cluster-details-exporter.py --rgw-endpoint <RGW endpoint:8080> --rbd-data-pool-name <pool name>

Observation:

1. Even if one provides an invalid RGW endpoint IP, the json is generated and on uploading the same in UI, the StorageCLuster gets created. 
2. if one provides an incorrect pool name, the script throws error (as expected)
>>   Excecution Failed: The provided 'rbd-data-pool-name': cbp, don't exists

Issue:

1. Unlike handling of incorrect pool name, the script doesnt thrown error when an incorrect RGW endpoint is provided. This should be handled as well


AFAIK, the information for the RGW endpoint can also be validated from the ceph cluster, similar to the block pool name




Version of all relevant components (if applicable):
----------------------------------------------------------------------

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2020-06-17-001505   True        False         7d6h    Cluster version is 4.5.0-0.nightly-2020-06-17-001505


$ oc get csv -n openshift-storage
NAME                            DISPLAY                       VERSION        REPLACES              PHASE
awss3operator.1.0.1             AWS S3 Operator               1.0.1          awss3operator.1.0.0   Succeeded
lib-bucket-provisioner.v1.0.0   lib-bucket-provisioner        1.0.0                                Succeeded
ocs-operator.v4.5.0-460.ci      OpenShift Container Storage   4.5.0-460.ci                         Installing



Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
----------------------------------------------------------------------
The RGW SC has incorrect IP address.

Is there any workaround available to the best of your knowledge?
----------------------------------------------------------------------
Not known

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
----------------------------------------------------------------------

3

Can this issue reproducible?
----------------------------------------------------------------------
Yes

Can this issue reproduce from the UI?
----------------------------------------------------------------------
yes

If this is a regression, please provide more details to justify this:
----------------------------------------------------------------------
No. Independent Mode is a new feature

Steps to Reproduce:
----------------------------------------------------------------------
1. Install OCP 4.5
2. Using deploy-with-olm.yaml, create the ocs-catalogsource with latest OCS 4.5 build
3. Install Subscription from OperatorHub-> RHOCS Operator-> Install Subscription
4. In openshift-storage namespace; Navigate to Installed Operators->OCS Operator->StorageCluster-> and click on Create OCS Cluster Service
5. Select Independent Mode
6. Download the script from the hyperlink below:
  Connect to external cluster
    Download ceph-external-cluster-details-exporter.py script and run on the RHCS cluster, then upload the results(JSON) in the External cluster metadata field. Download Script

7. On an external RHCS cluster, execute the script, but provide an incorrect RGW-endpoint IP address, e.g. 1.2.3.4

python3 ceph-external-cluster-details-exporter.py --rgw-endpoint 1.2.3.4:8080 --rbd-data-pool-name cbp-bm17

8. The json is generated. Upload the incorrect json in the UI and click on Create. The StorageCLuster gets created with an incorrect RGW endpoint in the RGW StorageCLass
    

Actual results:
----------------------------------------------------------------------
the storage Cluster gets created and the RGW SC is created with an incorrect RGW endpoint IP

Expected results:
----------------------------------------------------------------------
Some validation should be in place to fail the execution of the script if incorrect RGW-endpoint IP address is provided.



Additional info:
----------------------------------------------------------------------
$ oc get sc ocs-independent-storagecluster-ceph-rgw -o yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  creationTimestamp: "2020-06-24T18:06:16Z"
  managedFields:
  - apiVersion: storage.k8s.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:parameters:
        .: {}
        f:endpoint: {}
        f:objectStoreNamespace: {}
        f:region: {}
      f:provisioner: {}
      f:reclaimPolicy: {}
      f:volumeBindingMode: {}
    manager: ocs-operator
    operation: Update
    time: "2020-06-24T18:06:16Z"
  name: ocs-independent-storagecluster-ceph-rgw
  resourceVersion: "7072876"
  selfLink: /apis/storage.k8s.io/v1/storageclasses/ocs-independent-storagecluster-ceph-rgw
  uid: 387fe9c2-8dd7-491d-82d7-e465469c0dc3
parameters:
  endpoint: 1.2.3.4:8080
  objectStoreNamespace: openshift-storage
  region: us-east-1
provisioner: openshift-storage.ceph.rook.io/bucket
reclaimPolicy: Delete
volumeBindingMode: Immediate


_____________________________________________


Wed Jun 24 18:17:01 UTC 2020
  cluster:
    id:     fe01cf06-8c2b-4e5b-9fea-8a6a8e402b88
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum dell-r730-031,dell-r730-037,dell-r730-044 (age 2w)
    mgr: dell-r730-037(active, since 2d)
    mds: cephfs:1 {0=dell-r730-037=up:active} 1 up:standby
    osd: 9 osds: 9 up (since 2w), 9 in (since 2w)
    rgw: 1 daemon active (dell-r730-031.rgw0)        <<<----- RGW endpoint's hostname
 
  task status:
    scrub status:
        mds.dell-r730-037: idle
 
  data:
    pools:   12 pools, 488 pgs
    objects: 146.81k objects, 533 GiB
    usage:   1.6 TiB used, 3.3 TiB / 4.9 TiB avail
    pgs:     488 active+clean
 
  io:
    client:   36 KiB/s rd, 4.4 MiB/s wr, 9 op/s rd, 153 op/s wr

Comment 3 Michael Adam 2020-07-07 21:15:58 UTC
This script is not in ocs-operator. Or in rook. Not sure where it is... :-o

Comment 4 Michael Adam 2020-07-07 21:16:53 UTC
Seb, I think you were involved in creating this script.
Can you provide clarity?

Comment 5 Sébastien Han 2020-07-08 07:11:22 UTC
Neha,

We can not use the DNS that might be reported from the service map, it is unreliable and has no guarantee that the containers can resolve it, hence using an IP.
As far as the validation, we can implement a validation beforehand or the UI can do it.

As I said earlier, the more validation we do, the more bugs we might introduce.

Arun, please have a look at implementing an HTTP check against the given endpoint, for now, assume HTTP only.

Moving to 4.6.

Comment 7 Sébastien Han 2020-07-15 20:17:50 UTC
Neha, I have a PR for this. If you make it a blocker we can proceed and include it in 4.5, otherwise it will be in 4.6.

Comment 8 Neha Berry 2020-07-16 07:10:57 UTC

(In reply to leseb from comment #7)
> Neha, I have a PR for this. If you make it a blocker we can proceed and
> include it in 4.5, otherwise it will be in 4.6.

+1. Thanks a lot Sebastien.

Proposing as a blocker for OCS 4.5 as this fix will help a lot in mitigating incorrect RGW related issues. See comment#7 as well.

Comment 11 Michael Adam 2020-07-30 14:54:14 UTC
4.5.0-508.ci contains the fix

Comment 18 errata-xmlrpc 2020-09-15 10:17:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Container Storage 4.5.0 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3754

Comment 19 arun kumar mohan 2020-10-26 06:01:39 UTC
As the bug is fixed and doesn't require any other info...


Note You need to log in before you can comment on or make changes to this bug.