Bug 1824882

Summary:	[GSS][vsphere] Scaling OCS4 on a different StorageClass is not reflected correclty
Product:	[Red Hat Storage] Red Hat OpenShift Container Storage	Reporter:	Feras Al Taher <faltahe>
Component:	management-console	Assignee:	Nishanth Thomas <nthomas>
Status:	CLOSED DUPLICATE	QA Contact:	Elad <ebenahar>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	4.2	CC:	assingh, bkunal, dkochuka, hnallurv, jefbrown, madam, nthomas, ocs-bugs, ratamir
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-04-17 06:46:40 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Feras Al Taher 2020-04-16 15:37:21 UTC

Description of problem (please be detailed as possible and provide log
snippests):
We are trying to scale OCS on vSphere using a different StorageClass. As per OCS GUI Portal and Red Hat Documentation [1], you can choose a different StorageClass when scaling the OCS cluster. However, the resulted YAML does not create another storageDeviceSets with the different StorageClass. it only increases the count of the original storageDeviceSets.


[1] https://access.redhat.com/documentation/en-us/red_hat_openshift_container_storage/4.2/html-single/managing_openshift_container_storage/index#proc_scaling-up-storage-by-adding-capacity-to-your-openshift-container-storage-nodes_rhocs


Version of all relevant components (if applicable):
Tested so far in OCS4.2


Does this issue impact your ability to continue to work with the product
There is a workaround, but we are not sure if this workaround is supported???

Is there any workaround available to the best of your knowledge?
We can scale successfully OCS storage and specify a different StorageClass when editing the StorageCluster CR and adding a new storageDeviceSets with a different StorageClass


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
It is very simple to create this Bug


Can this issue reproducible?
YES

Can this issue reproduce from the UI?
YES


Steps to Reproduce:

1. Deployed the OCS cluster with OSD's using 'thin' storage class.
2. While scaling up the Storage cluster, I chose a new storage class 'new-thin-sc' that was using a different datastore.
3. The Storage cluster was scale-up, _but_ storage was consumed from 'thin' SC and not 'new-thin-sc'.
4. Further, checking the StorageCluster CR, the scale operation had only increased the count from 1 to 2 for the original storageDeviceSets.

** Data Captured :

+ Default Storage Class:
$ oc get sc thin -oyaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
  creationTimestamp: "2020-04-14T07:42:02Z"
  name: thin
  ownerReferences:
  - apiVersion: v1
    kind: clusteroperator
    name: storage
    uid: a029e6a6-0a47-4d4a-9088-7a027252b398
  resourceVersion: "9847"
  selfLink: /apis/storage.k8s.io/v1/storageclasses/thin
  uid: a3e32ab4-21e5-4261-a811-d669eb0dfd97
parameters:
  diskformat: thin
provisioner: kubernetes.io/vsphere-volume
reclaimPolicy: Delete
volumeBindingMode: Immediate


+ New Storage Class using a different datastore:
$ oc get sc new-thin-sc -oyaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: "false"
  creationTimestamp: "2020-04-14T11:17:37Z"
  name: new-thin-sc
  ownerReferences:
  - apiVersion: v1
    kind: clusteroperator
    name: storage
    uid: a029e6a6-0a47-4d4a-9088-7a027252b398
  resourceVersion: "96067"
  selfLink: /apis/storage.k8s.io/v1/storageclasses/new-thin-sc
  uid: 7d921c90-e1d9-4c07-b063-3e01eb8d7c02
parameters:
  datastore: NFSDatastore1
  diskformat: thin
provisioner: kubernetes.io/vsphere-volume
reclaimPolicy: Delete
volumeBindingMode: Immediate


+ StorageCluster CR after initial deployment:
apiVersion: ocs.openshift.io/v1
kind: StorageCluster
metadata:
  creationTimestamp: "2020-04-14T08:07:33Z"
  generation: 1
  name: ocs-storagecluster
  namespace: openshift-storage
  resourceVersion: "100000"
  selfLink: /apis/ocs.openshift.io/v1/namespaces/openshift-storage/storageclusters/ocs-storagecluster
  uid: eb058c82-6c99-4152-a19b-d60ca6d99b6e
spec:
  manageNodes: false
  storageDeviceSets:
  - count: 1
    dataPVCTemplate:
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 2Ti
        storageClassName: thin
        volumeMode: Block
    name: ocs-deviceset
    placement: {}
    portable: true
    replica: 3
    resources: {}
status:
  cephBlockPoolsCreated: true
  cephFilesystemsCreated: true
  cephObjectStoreUsersCreated: true
  cephObjectStoresCreated: true
  conditions:
  - lastHeartbeatTime: "2020-04-14T11:27:57Z"
    lastTransitionTime: "2020-04-14T08:07:36Z"
    message: Reconcile completed successfully
    reason: ReconcileCompleted
    status: "True"
    type: ReconcileComplete
  - lastHeartbeatTime: "2020-04-14T11:27:57Z"
    lastTransitionTime: "2020-04-14T08:22:17Z"
    message: Reconcile completed successfully
    reason: ReconcileCompleted
    status: "True"
    type: Available
  - lastHeartbeatTime: "2020-04-14T11:27:57Z"
    lastTransitionTime: "2020-04-14T09:49:03Z"
    message: Reconcile completed successfully
    reason: ReconcileCompleted
    status: "False"
    type: Progressing
  - lastHeartbeatTime: "2020-04-14T11:27:57Z"
    lastTransitionTime: "2020-04-14T08:07:33Z"
    message: Reconcile completed successfully
    reason: ReconcileCompleted
    status: "False"
    type: Degraded
  - lastHeartbeatTime: "2020-04-14T11:27:57Z"
    lastTransitionTime: "2020-04-14T09:49:03Z"
    message: Reconcile completed successfully
    reason: ReconcileCompleted
    status: "True"
    type: Upgradeable
  failureDomain: rack
  nodeTopologies:
    labels:
      topology.rook.io/rack:
      - rack0
      - rack1
      - rack2
  phase: Ready
  relatedObjects:
  - apiVersion: ceph.rook.io/v1
    kind: CephCluster
    name: ocs-storagecluster-cephcluster
    namespace: openshift-storage
    resourceVersion: "99998"
    uid: ce9c3b9e-29ae-455c-88a2-8318afe69667
  - apiVersion: noobaa.io/v1alpha1
    kind: NooBaa
    name: noobaa
    namespace: openshift-storage
    resourceVersion: "29784"
    uid: 03907609-6011-495d-a0d6-1db9ddcaa077
  storageClassesCreated: true


+ StorageCluster CR after scaling up using different SC:
apiVersion: ocs.openshift.io/v1
kind: StorageCluster
metadata:
  creationTimestamp: "2020-04-14T08:07:33Z"
  generation: 2
  name: ocs-storagecluster
  namespace: openshift-storage
  resourceVersion: "102226"
  selfLink: /apis/ocs.openshift.io/v1/namespaces/openshift-storage/storageclusters/ocs-storagecluster
  uid: eb058c82-6c99-4152-a19b-d60ca6d99b6e
spec:
  manageNodes: false
  storageDeviceSets:
  - count: 2
    dataPVCTemplate:
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 2Ti
        storageClassName: thin
        volumeMode: Block
    name: ocs-deviceset
    placement: {}
    portable: true
    replica: 3
    resources: {}
status:
  cephBlockPoolsCreated: true
  cephFilesystemsCreated: true
  cephObjectStoreUsersCreated: true
  cephObjectStoresCreated: true
  conditions:
  - lastHeartbeatTime: "2020-04-14T11:33:02Z"
    lastTransitionTime: "2020-04-14T08:07:36Z"
    message: Reconcile completed successfully
    reason: ReconcileCompleted
    status: "True"
    type: ReconcileComplete
  - lastHeartbeatTime: "2020-04-14T11:33:02Z"
    lastTransitionTime: "2020-04-14T08:22:17Z"
    message: Reconcile completed successfully
    reason: ReconcileCompleted
    status: "True"
    type: Available
  - lastHeartbeatTime: "2020-04-14T11:33:02Z"
    lastTransitionTime: "2020-04-14T11:31:04Z"
    message: Reconcile completed successfully
    reason: ReconcileCompleted
    status: "False"
    type: Progressing
  - lastHeartbeatTime: "2020-04-14T11:33:02Z"
    lastTransitionTime: "2020-04-14T08:07:33Z"
    message: Reconcile completed successfully
    reason: ReconcileCompleted
    status: "False"
    type: Degraded
  - lastHeartbeatTime: "2020-04-14T11:33:02Z"
    lastTransitionTime: "2020-04-14T11:31:04Z"
    message: Reconcile completed successfully
    reason: ReconcileCompleted
    status: "True"
    type: Upgradeable
  failureDomain: rack
  nodeTopologies:
    labels:
      topology.rook.io/rack:
      - rack0
      - rack1
      - rack2
  phase: Ready
  relatedObjects:
  - apiVersion: ceph.rook.io/v1
    kind: CephCluster
    name: ocs-storagecluster-cephcluster
    namespace: openshift-storage
    resourceVersion: "102217"
    uid: ce9c3b9e-29ae-455c-88a2-8318afe69667
  - apiVersion: noobaa.io/v1alpha1
    kind: NooBaa
    name: noobaa
    namespace: openshift-storage
    resourceVersion: "29784"
    uid: 03907609-6011-495d-a0d6-1db9ddcaa077
  storageClassesCreated: true


+ must-gather location:
  https://drive.google.com/open?id=1BlkuaXpAGiVh8_ZsRKwGEzC0331rBKwe

Actual results:


Expected results:


Additional info:

Comment 4 Feras Al Taher 2020-04-17 10:21:04 UTC

Hey Bipin,

I surprised that you closed this bug. First of all the documentation and public statement of ocs4.3 explain that you can choose different StroageClass. Can you please confirm that OCS4.3 work as expected. and this issue is fixed in OCS4.3. This is a BUG because the functionality does not work in OCS4.2.

I advise you to first track if this bug got fixed or not in OCS4.3. because if it wasn't then you should at least block users from choosing a different storageclass in the OCS portal.

Comment 8 Raz Tamir 2020-04-17 17:25:07 UTC

The documentation https://access.redhat.com/documentation/en-us/red_hat_openshift_container_storage/4.2/html-single/managing_openshift_container_storage/index#proc_scaling-up-storage-by-adding-capacity-to-your-openshift-container-storage-nodes_rhocs


Clearly says to "From this dialog box, you can set the requested additional capacity and the storage class. The size should always be set in multiples of 2TiB. On AWS, the storage class should be set to gp2. On VMWare, the storage class should be set to thin."

Based on this, selecting other storageclass other than thin or gp2 is not recommended

Comment 9 Red Hat Bugzilla 2024-01-06 04:28:56 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days