Bug 2254035
| Summary: | OSD pods scheduling is inconsistent during multiple device sets scenarios | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | T K Chandra Hasan <tkhasan> |
| Component: | ocs-operator | Assignee: | Malay Kumar parida <mparida> |
| Status: | ASSIGNED --- | QA Contact: | Elad <ebenahar> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.13 | CC: | assingh, bkunal, etamir, mparida, muagarwa, nberry, nigoyal, odf-bz-bot, sapillai, tnielsen |
| Target Milestone: | --- | Keywords: | Reopened |
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-12-19 09:45:41 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
T K Chandra Hasan
2023-12-11 16:25:46 UTC
I have collected odf must gather but not able to attach it here as it exceeds the limit. Is there another way to upload the logs? (In reply to T K Chandra Hasan from comment #0) > Description of problem (please be detailed as possible and provide log > snippests): > In IBMCloud ROKS cluster, we were validating the multiple deviceset features > and are observing inconsistency in OSD pod scheduling. We are following this > article to create devicesets > > https://access.redhat.com/articles/6214381 If I'm reading it correctly, the article helps to segregate nodes based on the labels and then customers can run workloads on those particular nodes. It does not mention that workload will be evenly distributed on the labeled nodes. Also this article is from 4.8. We now use `topologySpreadConstraints` to evenly distribute the OSD prepare pod and OSD pods across nodes. You can refer the `storageClassDeviceSets` in the `cephCluster` yaml below. Only ocs-deviceset (ocs-deviceset-0, ocs-deviceset-1, ocs-deviceset-2) seems to have the topologyspreadConstraint set. ``` storage: storageClassDeviceSets: - count: 1 name: ocs-deviceset-0 placement: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: cluster.ocs.openshift.io/openshift-storage operator: Exists tolerations: - effect: NoSchedule key: node.ocs.openshift.io/storage operator: Equal value: "true" topologySpreadConstraints: - labelSelector: matchExpressions: - key: ceph.rook.io/pvc operator: Exists maxSkew: 1 topologyKey: kubernetes.io/hostname whenUnsatisfiable: ScheduleAnyway portable: true preparePlacement: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: cluster.ocs.openshift.io/openshift-storage operator: Exists tolerations: - effect: NoSchedule key: node.ocs.openshift.io/storage operator: Equal value: "true" topologySpreadConstraints: - labelSelector: matchExpressions: - key: ceph.rook.io/pvc operator: Exists maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: DoNotSchedule - labelSelector: matchExpressions: - key: ceph.rook.io/pvc operator: Exists maxSkew: 1 topologyKey: kubernetes.io/hostname whenUnsatisfiable: ScheduleAnyway resources: limits: cpu: "2" memory: 5Gi requests: cpu: "2" memory: 5Gi tuneFastDeviceClass: true volumeClaimTemplates: - metadata: annotations: crushDeviceClass: "" spec: accessModes: - ReadWriteOnce resources: requests: storage: 512Gi storageClassName: ibmc-vpc-block-metro-10iops-tier volumeMode: Block status: {} - count: 1 name: ocs-deviceset-1 placement: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: cluster.ocs.openshift.io/openshift-storage operator: Exists tolerations: - effect: NoSchedule key: node.ocs.openshift.io/storage operator: Equal value: "true" topologySpreadConstraints: - labelSelector: matchExpressions: - key: ceph.rook.io/pvc operator: Exists maxSkew: 1 topologyKey: kubernetes.io/hostname whenUnsatisfiable: ScheduleAnyway portable: true preparePlacement: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: cluster.ocs.openshift.io/openshift-storage operator: Exists tolerations: - effect: NoSchedule key: node.ocs.openshift.io/storage operator: Equal value: "true" topologySpreadConstraints: - labelSelector: matchExpressions: - key: ceph.rook.io/pvc operator: Exists maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: DoNotSchedule - labelSelector: matchExpressions: - key: ceph.rook.io/pvc operator: Exists maxSkew: 1 topologyKey: kubernetes.io/hostname whenUnsatisfiable: ScheduleAnyway resources: limits: cpu: "2" memory: 5Gi requests: cpu: "2" memory: 5Gi tuneFastDeviceClass: true volumeClaimTemplates: - metadata: annotations: crushDeviceClass: "" spec: accessModes: - ReadWriteOnce resources: requests: storage: 512Gi storageClassName: ibmc-vpc-block-metro-10iops-tier volumeMode: Block status: {} - count: 1 name: ocs-deviceset-2 placement: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: cluster.ocs.openshift.io/openshift-storage operator: Exists tolerations: - effect: NoSchedule key: node.ocs.openshift.io/storage operator: Equal value: "true" topologySpreadConstraints: - labelSelector: matchExpressions: - key: ceph.rook.io/pvc operator: Exists maxSkew: 1 topologyKey: kubernetes.io/hostname whenUnsatisfiable: ScheduleAnyway portable: true preparePlacement: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: cluster.ocs.openshift.io/openshift-storage operator: Exists tolerations: - effect: NoSchedule key: node.ocs.openshift.io/storage operator: Equal value: "true" topologySpreadConstraints: - labelSelector: matchExpressions: - key: ceph.rook.io/pvc operator: Exists maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: DoNotSchedule - labelSelector: matchExpressions: - key: ceph.rook.io/pvc operator: Exists maxSkew: 1 topologyKey: kubernetes.io/hostname whenUnsatisfiable: ScheduleAnyway resources: limits: cpu: "2" memory: 5Gi requests: cpu: "2" memory: 5Gi tuneFastDeviceClass: true volumeClaimTemplates: - metadata: annotations: crushDeviceClass: "" spec: accessModes: - ReadWriteOnce resources: requests: storage: 512Gi storageClassName: ibmc-vpc-block-metro-10iops-tier volumeMode: Block status: {} - count: 1 name: ocs-deviceset-2-0 placement: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: cluster.ocs.openshift.io/openshift-storage operator: Exists tolerations: - effect: NoSchedule key: node.ocs.openshift.io/storage operator: Equal value: "true" topologySpreadConstraints: - labelSelector: matchExpressions: - key: ceph.rook.io/pvc operator: Exists maxSkew: 1 topologyKey: kubernetes.io/hostname whenUnsatisfiable: ScheduleAnyway portable: true preparePlacement: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: cluster.ocs.openshift.io/openshift-storage operator: Exists tolerations: - effect: NoSchedule key: node.ocs.openshift.io/storage operator: Equal value: "true" topologySpreadConstraints: - labelSelector: matchExpressions: - key: ceph.rook.io/pvc operator: Exists maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: DoNotSchedule - labelSelector: matchExpressions: - key: ceph.rook.io/pvc operator: Exists maxSkew: 1 topologyKey: kubernetes.io/hostname whenUnsatisfiable: ScheduleAnyway resources: limits: cpu: "2" memory: 5Gi requests: cpu: "2" memory: 5Gi tuneFastDeviceClass: true volumeClaimTemplates: - metadata: annotations: crushDeviceClass: "" spec: accessModes: - ReadWriteOnce resources: requests: storage: 512Gi storageClassName: ibmc-vpc-block-metro-5iops-tier volumeMode: Block status: {} - count: 1 name: ocs-deviceset-2-1 placement: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: cluster.ocs.openshift.io/openshift-storage operator: Exists tolerations: - effect: NoSchedule key: node.ocs.openshift.io/storage operator: Equal value: "true" topologySpreadConstraints: - labelSelector: matchExpressions: - key: ceph.rook.io/pvc operator: Exists maxSkew: 1 topologyKey: kubernetes.io/hostname whenUnsatisfiable: ScheduleAnyway portable: true preparePlacement: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: cluster.ocs.openshift.io/openshift-storage operator: Exists tolerations: - effect: NoSchedule key: node.ocs.openshift.io/storage operator: Equal value: "true" topologySpreadConstraints: - labelSelector: matchExpressions: - key: ceph.rook.io/pvc operator: Exists maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: DoNotSchedule - labelSelector: matchExpressions: - key: ceph.rook.io/pvc operator: Exists maxSkew: 1 topologyKey: kubernetes.io/hostname whenUnsatisfiable: ScheduleAnyway resources: limits: cpu: "2" memory: 5Gi requests: cpu: "2" memory: 5Gi tuneFastDeviceClass: true volumeClaimTemplates: - metadata: annotations: crushDeviceClass: "" spec: accessModes: - ReadWriteOnce resources: requests: storage: 512Gi storageClassName: ibmc-vpc-block-metro-5iops-tier volumeMode: Block status: {} - count: 1 name: ocs-deviceset-2-2 placement: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: cluster.ocs.openshift.io/openshift-storage operator: Exists tolerations: - effect: NoSchedule key: node.ocs.openshift.io/storage operator: Equal value: "true" topologySpreadConstraints: - labelSelector: matchExpressions: - key: ceph.rook.io/pvc operator: Exists maxSkew: 1 topologyKey: kubernetes.io/hostname whenUnsatisfiable: ScheduleAnyway portable: true preparePlacement: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: cluster.ocs.openshift.io/openshift-storage operator: Exists tolerations: - effect: NoSchedule key: node.ocs.openshift.io/storage operator: Equal value: "true" topologySpreadConstraints: - labelSelector: matchExpressions: - key: ceph.rook.io/pvc operator: Exists maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: DoNotSchedule - labelSelector: matchExpressions: - key: ceph.rook.io/pvc operator: Exists maxSkew: 1 topologyKey: kubernetes.io/hostname whenUnsatisfiable: ScheduleAnyway resources: limits: cpu: "2" memory: 5Gi requests: cpu: "2" memory: 5Gi tuneFastDeviceClass: true volumeClaimTemplates: - metadata: annotations: crushDeviceClass: "" spec: accessModes: - ReadWriteOnce resources: requests: storage: 512Gi storageClassName: ibmc-vpc-block-metro-5iops-tier volumeMode: Block status: {} - count: 1 name: ocs-deviceset-3-0 placement: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: cluster.ocs.openshift.io/openshift-storage-device-class operator: In values: - deviceset-3 portable: true preparePlacement: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: cluster.ocs.openshift.io/openshift-storage-device-class operator: In values: - deviceset-3 resources: limits: cpu: "2" memory: 5Gi requests: cpu: "2" memory: 5Gi tuneFastDeviceClass: true volumeClaimTemplates: - metadata: annotations: crushDeviceClass: deviceset-3 spec: accessModes: - ReadWriteOnce resources: requests: storage: 512Gi storageClassName: ibmc-vpc-block-metro-5iops-tier volumeMode: Block status: {} - count: 1 name: ocs-deviceset-3-1 placement: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: cluster.ocs.openshift.io/openshift-storage-device-class operator: In values: - deviceset-3 portable: true preparePlacement: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: cluster.ocs.openshift.io/openshift-storage-device-class operator: In values: - deviceset-3 resources: limits: cpu: "2" memory: 5Gi requests: cpu: "2" memory: 5Gi tuneFastDeviceClass: true volumeClaimTemplates: - metadata: annotations: crushDeviceClass: deviceset-3 spec: accessModes: - ReadWriteOnce resources: requests: storage: 512Gi storageClassName: ibmc-vpc-block-metro-5iops-tier volumeMode: Block status: {} - count: 1 name: ocs-deviceset-3-2 placement: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: cluster.ocs.openshift.io/openshift-storage-device-class operator: In values: - deviceset-3 portable: true preparePlacement: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: cluster.ocs.openshift.io/openshift-storage-device-class operator: In values: - deviceset-3 resources: limits: cpu: "2" memory: 5Gi requests: cpu: "2" memory: 5Gi tuneFastDeviceClass: true volumeClaimTemplates: - metadata: annotations: crushDeviceClass: deviceset-3 spec: accessModes: - ReadWriteOnce resources: requests: storage: 512Gi storageClassName: ibmc-vpc-block-metro-5iops-tier volumeMode: Block status: {} - count: 1 name: ocs-deviceset-4-0 placement: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: cluster.ocs.openshift.io/openshift-storage-device-class operator: In values: - deviceset-4 portable: true preparePlacement: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: cluster.ocs.openshift.io/openshift-storage-device-class operator: In values: - deviceset-4 resources: limits: cpu: "2" memory: 5Gi requests: cpu: "2" memory: 5Gi tuneFastDeviceClass: true volumeClaimTemplates: - metadata: annotations: crushDeviceClass: deviceset-4 spec: accessModes: - ReadWriteOnce resources: requests: storage: 512Gi storageClassName: ibmc-vpc-block-metro-general-purpose volumeMode: Block status: {} - count: 1 name: ocs-deviceset-4-1 placement: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: cluster.ocs.openshift.io/openshift-storage-device-class operator: In values: - deviceset-4 portable: true preparePlacement: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: cluster.ocs.openshift.io/openshift-storage-device-class operator: In values: - deviceset-4 resources: limits: cpu: "2" memory: 5Gi requests: cpu: "2" memory: 5Gi tuneFastDeviceClass: true volumeClaimTemplates: - metadata: annotations: crushDeviceClass: deviceset-4 spec: accessModes: - ReadWriteOnce resources: requests: storage: 512Gi storageClassName: ibmc-vpc-block-metro-general-purpose volumeMode: Block status: {} - count: 1 name: ocs-deviceset-4-2 placement: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: cluster.ocs.openshift.io/openshift-storage-device-class operator: In values: - deviceset-4 portable: true preparePlacement: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: cluster.ocs.openshift.io/openshift-storage-device-class operator: In values: - deviceset-4 resources: limits: cpu: "2" memory: 5Gi requests: cpu: "2" memory: 5Gi tuneFastDeviceClass: true volumeClaimTemplates: - metadata: annotations: crushDeviceClass: deviceset-4 spec: accessModes: - ReadWriteOnce resources: requests: storage: 512Gi storageClassName: ibmc-vpc-block-metro-general-purpose volumeMode: Block status: {} - count: 1 name: ocs-deviceset-5-0 placement: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: cluster.ocs.openshift.io/openshift-storage-device-class operator: In values: - deviceset-5 portable: true preparePlacement: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: cluster.ocs.openshift.io/openshift-storage-device-class operator: In values: - deviceset-5 resources: limits: cpu: "2" memory: 5Gi requests: cpu: "2" memory: 5Gi tuneFastDeviceClass: true volumeClaimTemplates: - metadata: annotations: crushDeviceClass: deviceset-5 spec: accessModes: - ReadWriteOnce resources: requests: storage: 512Gi storageClassName: ibmc-vpc-block-metro-general-purpose volumeMode: Block status: {} - count: 1 name: ocs-deviceset-5-1 placement: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: cluster.ocs.openshift.io/openshift-storage-device-class operator: In values: - deviceset-5 portable: true preparePlacement: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: cluster.ocs.openshift.io/openshift-storage-device-class operator: In values: - deviceset-5 resources: limits: cpu: "2" memory: 5Gi requests: cpu: "2" memory: 5Gi tuneFastDeviceClass: true volumeClaimTemplates: - metadata: annotations: crushDeviceClass: deviceset-5 spec: accessModes: - ReadWriteOnce resources: requests: storage: 512Gi storageClassName: ibmc-vpc-block-metro-general-purpose volumeMode: Block status: {} - count: 1 name: ocs-deviceset-5-2 placement: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: cluster.ocs.openshift.io/openshift-storage-device-class operator: In values: - deviceset-5 portable: true preparePlacement: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: cluster.ocs.openshift.io/openshift-storage-device-class operator: In values: - deviceset-5 resources: limits: cpu: "2" memory: 5Gi requests: cpu: "2" memory: 5Gi tuneFastDeviceClass: true volumeClaimTemplates: - metadata: annotations: crushDeviceClass: deviceset-5 spec: accessModes: - ReadWriteOnce resources: requests: storage: 512Gi storageClassName: ibmc-vpc-block-metro-general-purpose volumeMode: Block status: {} (In reply to Santosh Pillai from comment #6) > (In reply to T K Chandra Hasan from comment #0) > > Description of problem (please be detailed as possible and provide log > > snippests): > > In IBMCloud ROKS cluster, we were validating the multiple deviceset features > > and are observing inconsistency in OSD pod scheduling. We are following this > > article to create devicesets > > > > https://access.redhat.com/articles/6214381 > > If I'm reading it correctly, the article helps to segregate nodes based on > the labels and then customers can run workloads on those particular nodes. > It does not mention that workload will be evenly distributed on the labeled > nodes. Also this article is from 4.8. We now use `topologySpreadConstraints` > to evenly distribute the OSD prepare pod and OSD pods across nodes. You can > refer the `storageClassDeviceSets` in the `cephCluster` yaml below. Only > ocs-deviceset (ocs-deviceset-0, ocs-deviceset-1, ocs-deviceset-2) seems to > have the topologyspreadConstraint set. Thank you Santosh. I understand the article isn't update to date, but how does the OSD Pods getting distributed evenly when we use different storage class name? All the worker nodes are newly configured with same flavors. Could you provide me a sample deviceset snippet with topologySpreadConstraints which I can try and check. (In reply to T K Chandra Hasan from comment #7) > (In reply to Santosh Pillai from comment #6) > > (In reply to T K Chandra Hasan from comment #0) > > > Description of problem (please be detailed as possible and provide log > > > snippests): > > > In IBMCloud ROKS cluster, we were validating the multiple deviceset features > > > and are observing inconsistency in OSD pod scheduling. We are following this > > > article to create devicesets > > > > > > https://access.redhat.com/articles/6214381 > > > > If I'm reading it correctly, the article helps to segregate nodes based on > > the labels and then customers can run workloads on those particular nodes. > > It does not mention that workload will be evenly distributed on the labeled > > nodes. Also this article is from 4.8. We now use `topologySpreadConstraints` > > to evenly distribute the OSD prepare pod and OSD pods across nodes. You can > > refer the `storageClassDeviceSets` in the `cephCluster` yaml below. Only > > ocs-deviceset (ocs-deviceset-0, ocs-deviceset-1, ocs-deviceset-2) seems to > > have the topologyspreadConstraint set. > > Thank you Santosh. I understand the article isn't update to date, but how > does the OSD Pods getting distributed evenly when we use different storage > class name? The article states that all the OSDs with deviceClass `set1` will be created on nodes with `cluster.ocs.openshift.io/openshift-storage-device-class:set1` label. And then user can create blockpool using this deviceClass `set1` ``` apiVersion: ceph.rook.io/v1 kind: CephBlockPool Metadata: name: set1-pool namespace: openshift-storage Spec: deviceClass: set1 Parameters: ``` And then create storageClass using this blockpool name: ``` apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: set1-sc provisioner: rook-ceph.rbd.csi.ceph.com parameters: pool: set1-pool ``` So all the PVs using `set1-sc` storageClass will have the data restricted to nodes with labels `cluster.ocs.openshift.io/openshift-storage-device-class:set1` So I don't think the article is suggesting that OSDs will be evenly distributed on each node. Just that the workload will be spread across specific storage nodes. All the worker nodes are newly configured with same flavors. > Could you provide me a sample deviceset snippet with > topologySpreadConstraints which I can try and check. Currently I don't have it. I'll get back to you. Yesterday, spent some time in going through the ocs & ceph operator code in github and having following concern https://github.com/red-hat-storage/ocs-operator/blob/main/controllers/storagecluster/cephcluster.go#L750:L756 As per the above lines, the placement variable is set to true only when no parameters are specified in the deviceset which includes tolerationSpreadConstraint as well. Ideally if the user hasn't specified tolerationSpreadConstraint, then whatever the default has to be picked even though other parameters are specified. This would/might probably fix the pod scheduling issue in this case. Tried the following tsc which works fine, It would be great if operator handles this as well when not specified.
```
- config: {}
count: 1
dataPVCTemplate:
metadata: {}
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 512Gi
storageClassName: ibmc-vpc-block-metro-10iops-tier
volumeMode: Block
status: {}
deviceClass: deviceset-2
name: ocs-deviceset-2
placement:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: cluster.ocs.openshift.io/openshift-storage-device-class
operator: In
values:
- deviceset-2
tolerations:
- effect: NoSchedule
key: node.ocs.openshift.io/storage
operator: Equal
value: "true"
topologySpreadConstraints:
- labelSelector:
matchExpressions:
- key: ceph.rook.io/pvc
operator: Exists
maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
portable: true
preparePlacement:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: cluster.ocs.openshift.io/openshift-storage-device-class
operator: In
values:
- deviceset-2
tolerations:
- effect: NoSchedule
key: node.ocs.openshift.io/storage
operator: Equal
value: "true"
topologySpreadConstraints:
- labelSelector:
matchExpressions:
- key: ceph.rook.io/pvc
operator: Exists
maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
- labelSelector:
matchExpressions:
- key: ceph.rook.io/pvc
operator: Exists
maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
replica: 3
resources: {}
```
Good to know its working based on comment #10 (In reply to T K Chandra Hasan from comment #9) > Yesterday, spent some time in going through the ocs & ceph operator code in > github and having following concern > > https://github.com/red-hat-storage/ocs-operator/blob/main/controllers/ > storagecluster/cephcluster.go#L750:L756 > > As per the above lines, the placement variable is set to true only when no > parameters are specified in the deviceset which includes > tolerationSpreadConstraint as well. Ideally if the user hasn't specified > tolerationSpreadConstraint, then whatever the default has to be picked even > though other parameters are specified. This would/might probably fix the pod > scheduling issue in this case. TopologySpreadContraints (supportTSC) will always be enabled based on your current k8s server version. So if you not providing anything in deviceSet.Placement, that is`placement.NodeAffinity && ds.Placement.PodAffinity && ds.Placement.PodAntiAffinity && ds.Placement.TopolopgySpreadConstraints` are all nil, then OCS operator will use the defaults mentioned here - https://github.com/red-hat-storage/ocs-operator/blob/442ac957f5606c46c6f1c8401eb22b4e57e65ef0/controllers/defaults/placements.go#L58-L73. But since you have already provided the placement.NodeAffinity (for example in deviceSet-3), the operator is not using the defaults and using what you have provided in the CR. So IMO, the operator is working as expected. You should be editing the StorageCluster DeviceSet CR like you did in comment #10, by providing both the nodeAffinity (to restrict the osds on nodes based on the `openshift-storage-device-class` labels) and the topolopgySpreadConstraints (to ensure osds are equally distributed among those nodes) Moving it to the ODF operator team for analysis as they control the creation of storageClassDeviceSets and CephCluster CR. Closing it as per comment 10 This configuration setup needs to be documented before closing this bug. I had hard time understanding the behavior after going through the code. Hi Travis can you please take a look on this BZ specifically https://bugzilla.redhat.com/show_bug.cgi?id=2254035#c11. And give your views on whether we should always enforce the TSC even when there are some placement specs present. Or do we need some doc mention. Please give your views. (In reply to Malay Kumar parida from comment #15) > Hi Travis can you please take a look on this BZ specifically > https://bugzilla.redhat.com/show_bug.cgi?id=2254035#c11. > And give your views on whether we should always enforce the TSC even when > there are some placement specs present. Or do we need some doc mention. > Please give your views. Yes let's always enforce the TSCs even if the placement is specified by the user. Without the TSCs, the OSDs will rarely be balanced. Not a blocker for 4.15, moving out to 4.16 |