Bug 2279876
Summary: | [cee/sd][ODF] MDS pod scheduling blocked on hybrid clusters /w two Cephfs instances | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Anton Mark <amark> |
Component: | ocs-operator | Assignee: | Parth Arora <paarora> |
Status: | CLOSED ERRATA | QA Contact: | Nagendra Reddy <nagreddy> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.14 | CC: | bkunal, edonnell, etamir, hnallurv, msee, muagarwa, nagreddy, nberry, nigoyal, odf-bz-bot, paarora, tdesala, tnielsen |
Target Milestone: | --- | ||
Target Release: | ODF 4.17.0 | ||
Hardware: | All | ||
OS: | All | ||
Whiteboard: | |||
Fixed In Version: | 4.17.0-92 | Doc Type: | Enhancement |
Doc Text: |
.Support for creating multiple filesystems
This enhancement allows users to create multiple filesystems on the same cluster node for hybrid cluster or any other use cases.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2024-10-30 14:27:47 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 2281703 |
Description
Anton Mark
2024-05-09 13:41:59 UTC
The customer applied the change with mixed results. See below: I've applied it to the pre-Prod cluster and it does resolve the problem in that secondary MDS is running but it ended up running on the same node: rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-677655dcqtjhk 2/2 Running 0 8m55s 11.18.17.136 k8sbm-1494886.ny.fw.gs.com <none> <none> rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-599cbd6dv44qg 2/2 Running 0 4m10s 11.18.17.137 k8sbm-1494886.ny.fw.gs.com <none> <none> rook-ceph-mds-ocs-storagecluster-cephfilesystem-gcp-a-74f97w9vz 2/2 Running 0 30d 11.18.92.46 d158815-gcp-k8s27.sky.gs.com <none> <none> rook-ceph-mds-ocs-storagecluster-cephfilesystem-gcp-b-6579524gf 2/2 Running 0 30d 11.18.94.52 d158815-gcp-k8s29.sky.gs.com <none> <none> obviously, this isn't much redundancy and it seemed like the anti-affinity didn't actually work. Deleting one pod and letting it restart elsewhere resulted in a different node, which is good: ~> kca-1018 -n openshift-storage delete pod rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-599cbd6dv44qg pod "rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-599cbd6dv44qg" deleted ~> kc-1018 -n openshift-storage get pods -o wide | grep mds rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-677655dcqtjhk 2/2 Running 0 11m 11.18.17.136 k8sbm-1494886.ny.fw.gs.com <none> <none> rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-599cbd6d7rqwq 2/2 Running 0 19s 11.18.13.186 k8sbm-1494920.ny.fw.gs.com <none> <none> rook-ceph-mds-ocs-storagecluster-cephfilesystem-gcp-a-74f97w9vz 2/2 Running 0 30d 11.18.92.46 d158815-gcp-k8s27.sky.gs.com <none> <none> rook-ceph-mds-ocs-storagecluster-cephfilesystem-gcp-b-6579524gf 2/2 Running 0 30d 11.18.94.52 d158815-gcp-k8s29.sky.gs.com <none> <none> but they're in the same rack, so an upgrade or machine config change will hit both: ~> kc-get-node-labels 1018 k8sbm-1494886.ny.fw.gs.com ─────────────────────────────────────────────────┬──────────────────────────── admin.gs.com/firewall-policy │ admin.gs.com/legacy-hostname │ beta.kubernetes.io/arch │ amd64 beta.kubernetes.io/os │ linux cluster.ocs.openshift.io/openshift-storage │ firewall.gs.com/all │ true gs.com/location_Building │ 1300FED gs.com/requires-node-check │ 0 kubernetes.io/arch │ amd64 kubernetes.io/hostname │ k8sbm-1494886.ny.fw.gs.com kubernetes.io/os │ linux node-role.kubernetes.io/worker │ node.kubernetes.io/ingress │ contour-auth node.openshift.io/os_id │ rhcos policy.gs.com/train-enabled │ remediation.medik8s.io/exclude-from-remediation │ true topology.kubernetes.io/zone │ compute0 topology.rook.io/rack │ rack0 ─────────────────────────────────────────────────┴──────────────────────────── ~> kc-get-node-labels 1018 k8sbm-1494920.ny.fw.gs.com ─────────────────────────────────────────────────┬──────────────────────────── admin.gs.com/firewall-policy │ admin.gs.com/legacy-hostname │ beta.kubernetes.io/arch │ amd64 beta.kubernetes.io/os │ linux cluster.ocs.openshift.io/openshift-storage │ firewall.gs.com/all │ true gs.com/location_Building │ 1300FED gs.com/requires-node-check │ 0 gs.com/vip-hostname │ k8sbm-1494920.ny.fw.gs.com kubernetes.io/arch │ amd64 kubernetes.io/hostname │ k8sbm-1494920.ny.fw.gs.com kubernetes.io/os │ linux node-role.kubernetes.io/worker │ node.kubernetes.io/ingress │ contour node.kubernetes.io/role │ ingress node.openshift.io/os_id │ rhcos policy.gs.com/train-enabled │ remediation.medik8s.io/exclude-from-remediation │ true topology.kubernetes.io/zone │ compute0 topology.rook.io/rack │ rack0 ─────────────────────────────────────────────────┴──────────────────────────── Which is less than ideal. Ok, what we really want is required anti-affinity, but only for the two instances of the same object store. To be clear, you have to CephFS instances, correct? This would show two instances: oc get cephfilesystem So you need to define the antiaffinity differently for each of those two instances. The placement for the first instance would be controlled by the StorageCluster. But the second instance of CephFS, you created the CephFilesystem CR directly, right? In that case, please try: - Edit the StorageCluster placement to use required anti-affinity for the label "app.kubernetes.io/part-of=ocs-storagecluster-cephfilesystem" - Add placement to the 2nd CephFilesystem CR (since it's not controlled by the StorageCluster CR) to use required anti-affinity for the label "app.kubernetes.io/part-of=ocs-storagecluster-cephfilesystem-gcp" They have two instances of CephFS: ``` ocs-storagecluster-cephfilesystem 1 293d Ready ocs-storagecluster-cephfilesystem-gcp 1 49d Ready ``` To clarify, you're suggesting they have the storagecluster CR handle the placement of mds for ocs-storagecluster-cephfilesystem and the CephFilesystem CR handle placement for ocs-storagecluster-cephfilesystem-gcp ? Does it matter which? Could the storagecluster CR handle either of the filesystems while the cephfilesystem CR handles the other? The current CephFilesystem CR is attached to the case in supportshell. (In reply to Matt See from comment #18) > They have two instances of CephFS: > ``` > ocs-storagecluster-cephfilesystem 1 293d Ready > ocs-storagecluster-cephfilesystem-gcp 1 49d Ready > ``` > > To clarify, you're suggesting they have the storagecluster CR handle the > placement of mds for ocs-storagecluster-cephfilesystem and the > CephFilesystem CR handle placement for ocs-storagecluster-cephfilesystem-gcp > ? > > Does it matter which? Could the storagecluster CR handle either of the > filesystems while the cephfilesystem CR handles the other? Whichever CR owns creation of the filesystem would own specifying its placement. So I would expect: 1. ocs-storagecluster-cephfilesystem was created by default by ODF, and its placement is owned by the StorageCluster CR 2. ocs-storagecluster-cephfilesystem-gcp was created directly with a CephFilesystem CR (and not controlled by any setting in the StorageCluster CR), therefore its placement needs to be specified in the CephFilesystem CR Please update the RDT flag/text appropriately. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.17.0 Security, Enhancement, & Bug Fix Update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:8676 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days |