1671137 – [Monitoring] openshift-monitoring prometheus-operator pods do not tolerate masters

Bug 1671137 - [Monitoring] openshift-monitoring prometheus-operator pods do not tolerate masters

Summary: [Monitoring] openshift-monitoring prometheus-operator pods do not tolerate ma...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	RFE
Sub Component:
Version:	4.1.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.2.0
Assignee:	Christian Heidenreich
QA Contact:	Xiaoli Tian
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-01-30 20:40 UTC by W. Trevor King
Modified:	2019-05-21 09:36 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-05-21 09:36:15 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description W. Trevor King 2019-01-30 20:40:47 UTC

Description of problem:

The prometheus-operator pod should, like all CVO managed components without a special exception, tolerate masters, but it does not.

Version-Release number of selected component (if applicable):

$ oc adm release info --commits | grep 'monitor\|prometheus'
  cluster-monitoring-operator                   https://github.com/openshift/cluster-monitoring-operator                   bfacf9ed62a9ba0fcdaa93b280e8f05356c637d6
  k8s-prometheus-adapter                        https://github.com/openshift/k8s-prometheus-adapter                        19f9a956abcf670c44b2d49ed90cc337d94b4b9d
  prometheus                                    https://github.com/openshift/prometheus                                    41e87f1bb0d0c2fa530417ba27008fa25b74f5e9
  prometheus-alertmanager                       https://github.com/openshift/prometheus-alertmanager                       4617d5502332dc41c9c885cc12ecde5069191f73
  prometheus-config-reloader                    https://github.com/openshift/prometheus-operator                           f8a0aa170bf81ef70e16875053573a037461042d
  prometheus-node-exporter                      https://github.com/openshift/node_exporter                                 f248b582878226c8a8cd650223cf981cc556eb44
  prometheus-operator                           https://github.com/openshift/prometheus-operator                           f8a0aa170bf81ef70e16875053573a037461042d

How reproducible:

Every time.

Steps to Reproduce:

1. Break your Machine API provider, e.g. by running libvirt with a non-standard volume pool before [1] lands.
2. Launch a cluster.
3. Wait for things to stabilize.

Then:

$ oc get pods --all-namespaces | grep Pending
openshift-ingress                            router-default-7688479d99-nbnj8                            0/1       Pending     0          31m
openshift-monitoring                         prometheus-operator-647d84b5c6-rsplb                       0/1       Pending     0          31m
openshift-operator-lifecycle-manager         olm-operators-sf5sm                                        0/1       Pending     0          36m
$ oc get pod -o "jsonpath={.status.conditions}{'\n'}" -n openshift-monitoring prometheus-operator-647d84b5c6-rsplb
[map[lastTransitionTime:2019-01-30T20:00:03Z reason:Unschedulable message:0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate. type:PodScheduled status:False lastProbeTime:<nil>]]
$ oc get deployment -o yaml -n openshift-monitoring prometheus-operator
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
  creationTimestamp: 2019-01-30T20:00:03Z
  generation: 11
  labels:
    k8s-app: prometheus-operator
  name: prometheus-operator
  namespace: openshift-monitoring
  resourceVersion: "27979"
  selfLink: /apis/extensions/v1beta1/namespaces/openshift-monitoring/deployments/prometheus-operator
  uid: a1e04b14-24c9-11e9-8d1a-52fdfc072182
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      k8s-app: prometheus-operator
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        k8s-app: prometheus-operator
    spec:
      containers:
      - args:
        - --kubelet-service=kube-system/kubelet
        - --logtostderr=true
        - --config-reloader-image=registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-01-30-150036@sha256:ff927b3030ea14c5ffb591e1178f92ba7c4da1a0a4ca8098cd466ccf23bb761a
        - --prometheus-config-reloader=registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-01-30-150036@sha256:1628ab9c7452dfe599240f053657fdd6ac1573ffa1e762b949ab388731d5f0e3
        - --namespaces=openshift-apiserver-operator,openshift-controller-manager,openshift-controller-manager-operator,openshift-image-registry,openshift-kube-apiserver-operator,openshift-kube-controller-manager-operator,openshift-monitoring
        image: registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-01-30-150036@sha256:57c8ba286b7f9aaff419de87b06f20e672e81fc85c978a36e9c3ba491a66f763
        imagePullPolicy: IfNotPresent
        name: prometheus-operator
        ports:
        - containerPort: 8080
          name: http
          protocol: TCP
        resources: {}
        securityContext:
          procMount: Default
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      nodeSelector:
        beta.kubernetes.io/os: linux
      priorityClassName: system-cluster-critical
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: prometheus-operator
      serviceAccountName: prometheus-operator
      terminationGracePeriodSeconds: 30
status:
  conditions:
  - lastTransitionTime: 2019-01-30T20:00:03Z
    lastUpdateTime: 2019-01-30T20:00:03Z
    message: Deployment does not have minimum availability.
    reason: MinimumReplicasUnavailable
    status: "False"
    type: Available
  - lastTransitionTime: 2019-01-30T20:10:04Z
    lastUpdateTime: 2019-01-30T20:10:04Z
    message: ReplicaSet "prometheus-operator-647d84b5c6" has timed out progressing.
    reason: ProgressDeadlineExceeded
    status: "False"
    type: Progressing
  observedGeneration: 11
  replicas: 1
  unavailableReplicas: 1
  updatedReplicas: 1

Actual results:

Pending pod with "0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate."

Expected results:

A running pod.

Additional info:

"high" severity is based on Clayton's request [2].

[1]: https://github.com/openshift/cluster-api-provider-libvirt/pull/45
[2]: https://github.com/openshift/installer/pull/1146#issuecomment-459037176

Comment 1 minden 2019-01-31 16:44:45 UTC

Thanks for the in depth description Trevor. Would adding:

```yaml
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/master
```

be enough (like we do for node exporter [1])? Or do we need to prefer workers but also tolerate masters?

[1] https://github.com/openshift/cluster-monitoring-operator/blob/master/assets/node-exporter/daemonset.yaml#L76-L78

Comment 2 Frederic Branczyk 2019-02-04 15:07:03 UTC

This is a feature so scheduling it for the next release.

Comment 3 W. Trevor King 2019-03-06 00:36:40 UTC

Looks like you adjusted the toleration in the node-exporter recently [1], but yeah, I think that's all you need.  And I think we can leave scheduling up to Kubernetes instead of setting a compute preference.  This is not really my wheelhouse though, so take my advice with a grain of salt ;).

[1]: https://github.com/openshift/cluster-monitoring-operator/commit/fd3e4bf74cdcdaa27ff15f0055ea4359e227408f#diff-ea42f2a19bba3e07005d6b437ed1a902L77

Note You need to log in before you can comment on or make changes to this bug.