Bug 1850704

Summary: Independent Mode: The ceph-external-cluster-details-exporter.py generates the json output even with incorrect rgw-endpoint IP address
Product: [Red Hat Storage] Red Hat OpenShift Container Storage Reporter: Neha Berry <nberry>
Component: rookAssignee: Sébastien Han <shan>
Status: CLOSED ERRATA QA Contact: Rachael <rgeorge>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.5CC: amohan, bkunal, jthottan, madam, ocs-bugs, shan, sostapov
Target Milestone: ---Keywords: AutomationBackLog
Target Release: OCS 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 4.5.0-508.ci Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1862405 (view as bug list) Environment:
Last Closed: 2020-09-15 10:17:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Neha Berry 2020-06-24 18:29:49 UTC
Description of problem (please be detailed as possible and provide log
snippests):
----------------------------------------------------------------------
As part of Independent Mode install in OCP UI, one gets the option to download the ceph-external-cluster-details-exporter.py and run on external RHCS cluster to collect the config details.

Command:  python3 ceph-external-cluster-details-exporter.py --rgw-endpoint <RGW endpoint:8080> --rbd-data-pool-name <pool name>

Observation:

1. Even if one provides an invalid RGW endpoint IP, the json is generated and on uploading the same in UI, the StorageCLuster gets created. 
2. if one provides an incorrect pool name, the script throws error (as expected)
>>   Excecution Failed: The provided 'rbd-data-pool-name': cbp, don't exists

Issue:

1. Unlike handling of incorrect pool name, the script doesnt thrown error when an incorrect RGW endpoint is provided. This should be handled as well


AFAIK, the information for the RGW endpoint can also be validated from the ceph cluster, similar to the block pool name




Version of all relevant components (if applicable):
----------------------------------------------------------------------

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2020-06-17-001505   True        False         7d6h    Cluster version is 4.5.0-0.nightly-2020-06-17-001505


$ oc get csv -n openshift-storage
NAME                            DISPLAY                       VERSION        REPLACES              PHASE
awss3operator.1.0.1             AWS S3 Operator               1.0.1          awss3operator.1.0.0   Succeeded
lib-bucket-provisioner.v1.0.0   lib-bucket-provisioner        1.0.0                                Succeeded
ocs-operator.v4.5.0-460.ci      OpenShift Container Storage   4.5.0-460.ci                         Installing



Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
----------------------------------------------------------------------
The RGW SC has incorrect IP address.

Is there any workaround available to the best of your knowledge?
----------------------------------------------------------------------
Not known

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
----------------------------------------------------------------------

3

Can this issue reproducible?
----------------------------------------------------------------------
Yes

Can this issue reproduce from the UI?
----------------------------------------------------------------------
yes

If this is a regression, please provide more details to justify this:
----------------------------------------------------------------------
No. Independent Mode is a new feature

Steps to Reproduce:
----------------------------------------------------------------------
1. Install OCP 4.5
2. Using deploy-with-olm.yaml, create the ocs-catalogsource with latest OCS 4.5 build
3. Install Subscription from OperatorHub-> RHOCS Operator-> Install Subscription
4. In openshift-storage namespace; Navigate to Installed Operators->OCS Operator->StorageCluster-> and click on Create OCS Cluster Service
5. Select Independent Mode
6. Download the script from the hyperlink below:
  Connect to external cluster
    Download ceph-external-cluster-details-exporter.py script and run on the RHCS cluster, then upload the results(JSON) in the External cluster metadata field. Download Script

7. On an external RHCS cluster, execute the script, but provide an incorrect RGW-endpoint IP address, e.g. 1.2.3.4

python3 ceph-external-cluster-details-exporter.py --rgw-endpoint 1.2.3.4:8080 --rbd-data-pool-name cbp-bm17

8. The json is generated. Upload the incorrect json in the UI and click on Create. The StorageCLuster gets created with an incorrect RGW endpoint in the RGW StorageCLass
    

Actual results:
----------------------------------------------------------------------
the storage Cluster gets created and the RGW SC is created with an incorrect RGW endpoint IP

Expected results:
----------------------------------------------------------------------
Some validation should be in place to fail the execution of the script if incorrect RGW-endpoint IP address is provided.



Additional info:
----------------------------------------------------------------------
$ oc get sc ocs-independent-storagecluster-ceph-rgw -o yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  creationTimestamp: "2020-06-24T18:06:16Z"
  managedFields:
  - apiVersion: storage.k8s.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:parameters:
        .: {}
        f:endpoint: {}
        f:objectStoreNamespace: {}
        f:region: {}
      f:provisioner: {}
      f:reclaimPolicy: {}
      f:volumeBindingMode: {}
    manager: ocs-operator
    operation: Update
    time: "2020-06-24T18:06:16Z"
  name: ocs-independent-storagecluster-ceph-rgw
  resourceVersion: "7072876"
  selfLink: /apis/storage.k8s.io/v1/storageclasses/ocs-independent-storagecluster-ceph-rgw
  uid: 387fe9c2-8dd7-491d-82d7-e465469c0dc3
parameters:
  endpoint: 1.2.3.4:8080
  objectStoreNamespace: openshift-storage
  region: us-east-1
provisioner: openshift-storage.ceph.rook.io/bucket
reclaimPolicy: Delete
volumeBindingMode: Immediate


_____________________________________________


Wed Jun 24 18:17:01 UTC 2020
  cluster:
    id:     fe01cf06-8c2b-4e5b-9fea-8a6a8e402b88
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum dell-r730-031,dell-r730-037,dell-r730-044 (age 2w)
    mgr: dell-r730-037(active, since 2d)
    mds: cephfs:1 {0=dell-r730-037=up:active} 1 up:standby
    osd: 9 osds: 9 up (since 2w), 9 in (since 2w)
    rgw: 1 daemon active (dell-r730-031.rgw0)        <<<----- RGW endpoint's hostname
 
  task status:
    scrub status:
        mds.dell-r730-037: idle
 
  data:
    pools:   12 pools, 488 pgs
    objects: 146.81k objects, 533 GiB
    usage:   1.6 TiB used, 3.3 TiB / 4.9 TiB avail
    pgs:     488 active+clean
 
  io:
    client:   36 KiB/s rd, 4.4 MiB/s wr, 9 op/s rd, 153 op/s wr

Comment 3 Michael Adam 2020-07-07 21:15:58 UTC
This script is not in ocs-operator. Or in rook. Not sure where it is... :-o

Comment 4 Michael Adam 2020-07-07 21:16:53 UTC
Seb, I think you were involved in creating this script.
Can you provide clarity?

Comment 5 Sébastien Han 2020-07-08 07:11:22 UTC
Neha,

We can not use the DNS that might be reported from the service map, it is unreliable and has no guarantee that the containers can resolve it, hence using an IP.
As far as the validation, we can implement a validation beforehand or the UI can do it.

As I said earlier, the more validation we do, the more bugs we might introduce.

Arun, please have a look at implementing an HTTP check against the given endpoint, for now, assume HTTP only.

Moving to 4.6.

Comment 7 Sébastien Han 2020-07-15 20:17:50 UTC
Neha, I have a PR for this. If you make it a blocker we can proceed and include it in 4.5, otherwise it will be in 4.6.

Comment 8 Neha Berry 2020-07-16 07:10:57 UTC

(In reply to leseb from comment #7)
> Neha, I have a PR for this. If you make it a blocker we can proceed and
> include it in 4.5, otherwise it will be in 4.6.

+1. Thanks a lot Sebastien.

Proposing as a blocker for OCS 4.5 as this fix will help a lot in mitigating incorrect RGW related issues. See comment#7 as well.

Comment 11 Michael Adam 2020-07-30 14:54:14 UTC
4.5.0-508.ci contains the fix

Comment 18 errata-xmlrpc 2020-09-15 10:17:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Container Storage 4.5.0 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3754

Comment 19 arun kumar mohan 2020-10-26 06:01:39 UTC
As the bug is fixed and doesn't require any other info...