Bug 1891856
Summary: | ocs-metrics-exporter pod should have tolerations for OCS taint | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat OpenShift Container Storage | Reporter: | Neha Berry <nberry> |
Component: | ocs-operator | Assignee: | Jose A. Rivera <jarrpa> |
Status: | CLOSED ERRATA | QA Contact: | Shrivaibavi Raghaventhiran <sraghave> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.6 | CC: | ebenahar, madam, muagarwa, ocs-bugs, sostapov, uchapaga |
Target Milestone: | --- | Keywords: | AutomationBackLog |
Target Release: | OCS 4.6.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | 4.6.0-149.ci | Doc Type: | No Doc Update |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-12-17 06:25:20 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Neha Berry
2020-10-27 14:20:05 UTC
Tested on infra nodes setup: The ocs-metrics-exporter pod was having toleration but was running on non OCS nodes I respinned the ocs-metrics it still runs on the same node and not migrated to infra nodes since the ocs-metrics-exporter pod have toleration for ocs-taints it should run on infra nodes after respin was my expectation. If the above is not expected, Please clarify on other ways to verify the behavior. Raising the need info on the same @neha @umanga Versions: ---------- 4.6.0-0.nightly-2020-10-14-095718 ocs-operator.v4.6.0-152.ci Console output: ---------------- $ oc get csv -n openshift-storage NAME DISPLAY VERSION REPLACES PHASE ocs-operator.v4.6.0-152.ci OpenShift Container Storage 4.6.0-152.ci ocs-operator.v4.6.0-144.ci Succeeded $ oc get nodes --show-labels | grep ocs compute-0 Ready infra,worker 20d v1.19.0+d59ce34 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster.ocs.openshift.io/openshift-storage=,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-0,kubernetes.io/os=linux,node-role.kubernetes.io/infra=,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos,topology.rook.io/rack=rack2 compute-1 Ready infra,worker 20d v1.19.0+d59ce34 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster.ocs.openshift.io/openshift-storage=,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-1,kubernetes.io/os=linux,node-role.kubernetes.io/infra=,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos,topology.rook.io/rack=rack0 compute-2 Ready infra,worker 20d v1.19.0+d59ce34 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster.ocs.openshift.io/openshift-storage=,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-2,kubernetes.io/os=linux,node-role.kubernetes.io/infra=,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos,topology.rook.io/rack=rack1 ocs-metrics-exporter ============== f:tolerations: {} manager: olm operation: Update time: "2020-11-03T10:23:35Z" name: ocs-metrics-exporter namespace: openshift-storage ownerReferences: - apiVersion: operators.coreos.com/v1alpha1 blockOwnerDeletion: false controller: false kind: ClusterServiceVersion name: ocs-operator.v4.6.0-152.ci uid: 5b89c4e1-b273-4dac-83f1-698db1184a1f resourceVersion: "28789890" selfLink: /apis/apps/v1/namespaces/openshift-storage/deployments/ocs-metrics-exporter uid: 6ff5e5ca-c57d-4e0f-8ac9-db487c29d787 spec: -- tolerations: - effect: NoSchedule key: node.ocs.openshift.io/storage operator: Equal value: "true" status: availableReplicas: 1 conditions: - lastTransitionTime: "2020-10-28T07:23:45Z" lastUpdateTime: "2020-10-28T07:23:45Z" message: Deployment has minimum availability. reason: MinimumReplicasAvailable status: "True" type: Available - lastTransitionTime: "2020-10-15T09:04:03Z" lastUpdateTime: "2020-11-03T10:23:02Z" message: ReplicaSet "ocs-metrics-exporter-6d9867695b" has successfully progressed. $ oc get nodes NAME STATUS ROLES AGE VERSION compute-0 Ready infra,worker 19d v1.19.0+d59ce34 compute-1 Ready infra,worker 19d v1.19.0+d59ce34 compute-2 Ready infra,worker 19d v1.19.0+d59ce34 compute-3 Ready worker 19d v1.19.0+d59ce34 compute-4 Ready worker 19d v1.19.0+d59ce34 compute-5 Ready worker 19d v1.19.0+d59ce34 control-plane-0 Ready master 19d v1.19.0+d59ce34 control-plane-1 Ready master 19d v1.19.0+d59ce34 control-plane-2 Ready master 19d v1.19.0+d59ce34 (python38) [sraghave@localhost ~]$ (python38) [sraghave@localhost ~]$ oc get pods -n openshift-storage -o wide | grep ocs-metrics ocs-metrics-exporter-6d9867695b-f4gft 1/1 Running 0 21h 10.128.3.130 compute-4 <none> <none> $ oc delete pod ocs-metrics-exporter-6d9867695b-f4gft -n openshift-storage pod "ocs-metrics-exporter-6d9867695b-f4gft" deleted $ oc get pods -n openshift-storage -o wide | grep ocs-metrics ocs-metrics-exporter-6d9867695b-q2bqg 1/1 Running 0 39s 10.128.2.89 compute-4 <none> <none> This is expected. That's all taints and tolerations can do. It could run on infra nodes. If we want to ensure that it does, we need Node Affinities and that's a different issue. This BZ is verified as per Comment 5. Test environment: ------------------- Infra labelled and OCS tainted nodes Test steps: ----------- 1. ocs-metrics pod was running on non-ocs node 2. Cordoned the non-ocs workers 3. Respinned the ocs-metrics-exporter pod 3. The ocs-metrics exporter pod started running on ocs-node Console output: --------------- $ oc get pods -n openshift-storage -o wide | grep ocs-metrics ocs-metrics-exporter-6d9867695b-q2bqg 1/1 Running 0 28h 10.128.2.89 compute-4 <none> <none> $ oc delete pod ocs-metrics-exporter-6d9867695b-q2bqg -n openshift-storage pod "ocs-metrics-exporter-6d9867695b-q2bqg" deleted $ oc get pods -n openshift-storage -o wide | grep ocs-metrics ocs-metrics-exporter-6d9867695b-6cscf 1/1 Running 0 18s 10.131.0.28 compute-0 <none> <none> $ oc get nodes NAME STATUS ROLES AGE VERSION compute-0 Ready infra,worker 21d v1.19.0+d59ce34 compute-1 Ready infra,worker 21d v1.19.0+d59ce34 compute-2 Ready infra,worker 21d v1.19.0+d59ce34 compute-3 Ready,SchedulingDisabled worker 21d v1.19.0+d59ce34 compute-4 Ready,SchedulingDisabled worker 21d v1.19.0+d59ce34 compute-5 Ready,SchedulingDisabled worker 21d v1.19.0+d59ce34 control-plane-0 Ready master 21d v1.19.0+d59ce34 control-plane-1 Ready master 21d v1.19.0+d59ce34 control-plane-2 Ready master 21d v1.19.0+d59ce34 With the above verifications and based on comment #5 and #6 moving this BZ to verified state. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.6.0 security, bug fix, enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5605 |