Description of problem: The prometheus-operator pod should, like all CVO managed components without a special exception, tolerate masters, but it does not. Version-Release number of selected component (if applicable): $ oc adm release info --commits | grep 'monitor\|prometheus' cluster-monitoring-operator https://github.com/openshift/cluster-monitoring-operator bfacf9ed62a9ba0fcdaa93b280e8f05356c637d6 k8s-prometheus-adapter https://github.com/openshift/k8s-prometheus-adapter 19f9a956abcf670c44b2d49ed90cc337d94b4b9d prometheus https://github.com/openshift/prometheus 41e87f1bb0d0c2fa530417ba27008fa25b74f5e9 prometheus-alertmanager https://github.com/openshift/prometheus-alertmanager 4617d5502332dc41c9c885cc12ecde5069191f73 prometheus-config-reloader https://github.com/openshift/prometheus-operator f8a0aa170bf81ef70e16875053573a037461042d prometheus-node-exporter https://github.com/openshift/node_exporter f248b582878226c8a8cd650223cf981cc556eb44 prometheus-operator https://github.com/openshift/prometheus-operator f8a0aa170bf81ef70e16875053573a037461042d How reproducible: Every time. Steps to Reproduce: 1. Break your Machine API provider, e.g. by running libvirt with a non-standard volume pool before [1] lands. 2. Launch a cluster. 3. Wait for things to stabilize. Then: $ oc get pods --all-namespaces | grep Pending openshift-ingress router-default-7688479d99-nbnj8 0/1 Pending 0 31m openshift-monitoring prometheus-operator-647d84b5c6-rsplb 0/1 Pending 0 31m openshift-operator-lifecycle-manager olm-operators-sf5sm 0/1 Pending 0 36m $ oc get pod -o "jsonpath={.status.conditions}{'\n'}" -n openshift-monitoring prometheus-operator-647d84b5c6-rsplb [map[lastTransitionTime:2019-01-30T20:00:03Z reason:Unschedulable message:0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate. type:PodScheduled status:False lastProbeTime:<nil>]] $ oc get deployment -o yaml -n openshift-monitoring prometheus-operator apiVersion: extensions/v1beta1 kind: Deployment metadata: annotations: deployment.kubernetes.io/revision: "1" creationTimestamp: 2019-01-30T20:00:03Z generation: 11 labels: k8s-app: prometheus-operator name: prometheus-operator namespace: openshift-monitoring resourceVersion: "27979" selfLink: /apis/extensions/v1beta1/namespaces/openshift-monitoring/deployments/prometheus-operator uid: a1e04b14-24c9-11e9-8d1a-52fdfc072182 spec: progressDeadlineSeconds: 600 replicas: 1 revisionHistoryLimit: 10 selector: matchLabels: k8s-app: prometheus-operator strategy: rollingUpdate: maxSurge: 25% maxUnavailable: 25% type: RollingUpdate template: metadata: creationTimestamp: null labels: k8s-app: prometheus-operator spec: containers: - args: - --kubelet-service=kube-system/kubelet - --logtostderr=true - --config-reloader-image=registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-01-30-150036@sha256:ff927b3030ea14c5ffb591e1178f92ba7c4da1a0a4ca8098cd466ccf23bb761a - --prometheus-config-reloader=registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-01-30-150036@sha256:1628ab9c7452dfe599240f053657fdd6ac1573ffa1e762b949ab388731d5f0e3 - --namespaces=openshift-apiserver-operator,openshift-controller-manager,openshift-controller-manager-operator,openshift-image-registry,openshift-kube-apiserver-operator,openshift-kube-controller-manager-operator,openshift-monitoring image: registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-01-30-150036@sha256:57c8ba286b7f9aaff419de87b06f20e672e81fc85c978a36e9c3ba491a66f763 imagePullPolicy: IfNotPresent name: prometheus-operator ports: - containerPort: 8080 name: http protocol: TCP resources: {} securityContext: procMount: Default terminationMessagePath: /dev/termination-log terminationMessagePolicy: File dnsPolicy: ClusterFirst nodeSelector: beta.kubernetes.io/os: linux priorityClassName: system-cluster-critical restartPolicy: Always schedulerName: default-scheduler securityContext: {} serviceAccount: prometheus-operator serviceAccountName: prometheus-operator terminationGracePeriodSeconds: 30 status: conditions: - lastTransitionTime: 2019-01-30T20:00:03Z lastUpdateTime: 2019-01-30T20:00:03Z message: Deployment does not have minimum availability. reason: MinimumReplicasUnavailable status: "False" type: Available - lastTransitionTime: 2019-01-30T20:10:04Z lastUpdateTime: 2019-01-30T20:10:04Z message: ReplicaSet "prometheus-operator-647d84b5c6" has timed out progressing. reason: ProgressDeadlineExceeded status: "False" type: Progressing observedGeneration: 11 replicas: 1 unavailableReplicas: 1 updatedReplicas: 1 Actual results: Pending pod with "0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate." Expected results: A running pod. Additional info: "high" severity is based on Clayton's request [2]. [1]: https://github.com/openshift/cluster-api-provider-libvirt/pull/45 [2]: https://github.com/openshift/installer/pull/1146#issuecomment-459037176
Thanks for the in depth description Trevor. Would adding: ```yaml tolerations: - effect: NoSchedule key: node-role.kubernetes.io/master ``` be enough (like we do for node exporter [1])? Or do we need to prefer workers but also tolerate masters? [1] https://github.com/openshift/cluster-monitoring-operator/blob/master/assets/node-exporter/daemonset.yaml#L76-L78
This is a feature so scheduling it for the next release.
Looks like you adjusted the toleration in the node-exporter recently [1], but yeah, I think that's all you need. And I think we can leave scheduling up to Kubernetes instead of setting a compute preference. This is not really my wheelhouse though, so take my advice with a grain of salt ;). [1]: https://github.com/openshift/cluster-monitoring-operator/commit/fd3e4bf74cdcdaa27ff15f0055ea4359e227408f#diff-ea42f2a19bba3e07005d6b437ed1a902L77