Bug 1721922 - alertmanager-main pod does not apply the toleration config
Summary: alertmanager-main pod does not apply the toleration config
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.2.0
Assignee: Pawel Krupa
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-06-19 09:10 UTC by Junqi Zhao
Modified: 2019-10-16 06:32 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-16 06:32:02 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
monitoring dump (267.85 KB, application/gzip)
2019-07-02 02:14 UTC, Junqi Zhao
no flags Details
controller_scheduler logs (31.51 KB, application/gzip)
2019-07-02 07:55 UTC, Junqi Zhao
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github coreos prometheus-operator pull 2657 0 None closed pkg/alertmanager: increase terminationGracePeriod to 120 2021-01-13 13:27:23 UTC
Github coreos prometheus-operator pull 2676 0 None closed pkg/alertmanager: change podManagementPolicy to parallel to prevent statefulset reconciliation from hanging 2021-01-13 13:26:42 UTC
Github openshift prometheus-operator pull 33 0 None closed Bug 1721922: Upgrade to prometheus-operator 0.31.1 2021-01-13 13:26:43 UTC
Github openshift prometheus-operator pull 35 0 None closed Synchronize with upstream master branch 2021-01-13 13:27:23 UTC
Red Hat Product Errata RHBA-2019:2922 0 None None None 2019-10-16 06:32:16 UTC

Description Junqi Zhao 2019-06-19 09:10:07 UTC
Description of problem:
label all nodes with monitoring=true
# for i in $(kubectl get node --no-headers | awk '{print $1}'); do echo $i; kubectl label node $i monitoring=true --overwrite=true;done

taint all nodes with NoExecute except the node where cluster-monitoring-operator pod is running at
# for i in $(kubectl get node --no-headers | grep -v  ${cluster-monitoring-operator_node} | awk '{print $1}'); do echo $i;  kubectl taint nodes $i monitoring=true:NoExecute; done
	
create toleration.yaml, to tolerate monitoring=true:NoExecute nodes, no need to replace the nodeSelector for node-exporter, content see below
************************************************
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    alertmanagerMain:
      nodeSelector:
        monitoring: true
      tolerations:
      - key: "monitoring"
        operator: "Exists"
        effect: "NoExecute"
    grafana:
      nodeSelector:
        monitoring: true
      tolerations:
      - key: "monitoring"
        operator: "Exists"
        effect: "NoExecute"
    kubeStateMetrics:
      nodeSelector:
        monitoring: true
      tolerations:
      - key: "monitoring"
        operator: "Exists"
        effect: "NoExecute"
    k8sPrometheusAdapter:
      nodeSelector:
        monitoring: true
      tolerations:
      - key: "monitoring"
        operator: "Exists"
        effect: "NoExecute"
    nodeExporter:
      tolerations:
      - key: "monitoring"
        operator: "Exists"
        effect: "NoExecute"
    prometheusK8s:
      nodeSelector:
        monitoring: true
      tolerations:
      - key: "monitoring"
        operator: "Exists"
        effect: "NoExecute"
    prometheusOperator:
      nodeSelector:
        monitoring: true
      tolerations:
      - key: "monitoring"
        operator: "Exists"
        effect: "NoExecute"
    telemeterClient:
      nodeSelector:
        monitoring: true
      tolerations:
      - key: "monitoring"
        operator: "Exists"
        effect: "NoExecute"
************************************************
after creating the cluster-monitoring-config configmap, alertmanager-main-0 is in pending status
# oc -n openshift-monitoring get pod
NAME                                           READY   STATUS    RESTARTS   AGE
alertmanager-main-0                            0/3     Pending   0          8m38s
cluster-monitoring-operator-6cd9cd6d86-xf6rg   1/1     Running   0          27m
grafana-5967db758b-c8f7z                       2/2     Running   0          2m10s
kube-state-metrics-757745bf56-swvmv            3/3     Running   0          2m21s
node-exporter-46b98                            2/2     Running   0          27m
node-exporter-bljh5                            2/2     Running   0          27m
node-exporter-j9cxh                            2/2     Running   0          27m
node-exporter-mxnvm                            2/2     Running   0          27m
node-exporter-nkrr5                            2/2     Running   0          27m
node-exporter-w4z2k                            2/2     Running   0          27m
prometheus-adapter-7584485777-8j4jm            1/1     Running   0          2m11s
prometheus-adapter-7584485777-dkrhm            1/1     Running   0          2m3s
prometheus-k8s-0                               6/6     Running   1          108s
prometheus-k8s-1                               6/6     Running   1          119s
prometheus-operator-5859876475-8kf8z           1/1     Running   0          2m21s
telemeter-client-68fd784876-8fcxt              3/3     Running   0          2m15s
************************************************
# oc -n openshift-monitoring describe pod alertmanager-main-0
Tolerations:     node.kubernetes.io/memory-pressure:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                         From               Message
  ----     ------            ----                        ----               -------
  Warning  FailedScheduling  <invalid> (x18 over 5m17s)  default-scheduler  0/6 nodes are available: 6 node(s) had taints that the pod didn't tolerate.
************************************************
thre is not key: "monitoring" toleration in alertmanager-main-0 pod
# oc -n openshift-monitoring get pod alertmanager-main-0 -oyaml | grep tolerations -A13
  tolerations:
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: config-volume
***************************************************************************
key: "monitoring" toleration already in alertmanager-main statefulset
# oc -n openshift-monitoring get statefulset alertmanager-main -oyaml | grep tolerations -A13
      tolerations:
      - effect: NoExecute
        key: monitoring
        operator: Exists
      volumes:
      - name: config-volume
        secret:
          defaultMode: 420
          secretName: alertmanager-main
      - name: secret-alertmanager-main-tls
        secret:
          defaultMode: 420
          secretName: alertmanager-main-tls
      - name: secret-alertmanager-main-proxy
***************************************************************************
delete alertmanager-main-0 pod, alertmanager-main pods will be started
# oc -n openshift-monitoring delete pod alertmanager-main-0
pod "alertmanager-main-0" deleted


# oc -n openshift-monitoring get pod
NAME                                           READY   STATUS    RESTARTS   AGE
alertmanager-main-0                            3/3     Running   0          3m8s
alertmanager-main-1                            3/3     Running   0          2m59s
alertmanager-main-2                            3/3     Running   0          2m50s
cluster-monitoring-operator-6cd9cd6d86-xf6rg   1/1     Running   0          33m
grafana-5967db758b-c8f7z                       2/2     Running   0          7m40s
kube-state-metrics-757745bf56-swvmv            3/3     Running   0          7m51s
node-exporter-46b98                            2/2     Running   0          33m
node-exporter-bljh5                            2/2     Running   0          32m
node-exporter-j9cxh                            2/2     Running   0          32m
node-exporter-mxnvm                            2/2     Running   0          33m
node-exporter-nkrr5                            2/2     Running   0          33m
node-exporter-w4z2k                            2/2     Running   0          32m
prometheus-adapter-7584485777-8j4jm            1/1     Running   0          7m41s
prometheus-adapter-7584485777-dkrhm            1/1     Running   0          7m33s
prometheus-k8s-0                               6/6     Running   1          7m18s
prometheus-k8s-1                               6/6     Running   1          7m29s
prometheus-operator-5859876475-8kf8z           1/1     Running   0          7m51s
telemeter-client-68fd784876-8fcxt              3/3     Running   0          7m45s

key: "monitoring" toleration already in alertmanager-main pods
# oc -n openshift-monitoring get pod alertmanager-main-0 -oyaml | grep tolerations -A15
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  - effect: NoExecute
    key: monitoring
    operator: Exists
  volumes:

Version-Release number of selected component (if applicable):
4.2.0-0.ci-2019-06-19-023510

How reproducible:
Always

Steps to Reproduce:
1. See the description part
2.
3.

Actual results:


Expected results:


Additional info:

Comment 4 Junqi Zhao 2019-07-01 09:17:00 UTC
The PRs are merged, but issue is not fixed, payload: 4.2.0-0.nightly-2019-06-30-221852
# oc -n openshift-monitoring logs prometheus-operator-54cddf9c9d-dd977 | grep "Starting Prometheus Operator version"
ts=2019-07-01T09:07:05.850590352Z caller=main.go:181 msg="Starting Prometheus Operator version '0.31.1'."


# oc -n openshift-monitoring get pod alertmanager-main-0 -oyaml | grep terminationGracePeriod
  terminationGracePeriodSeconds: 120

# oc -n openshift-monitoring get pod
NAME                                           READY   STATUS    RESTARTS   AGE
alertmanager-main-0                            0/3     Pending   0          5m12s


# oc -n openshift-monitoring describe pod alertmanager-main-0
Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  44s (x8 over 2m28s)  default-scheduler  0/6 nodes are available: 6 node(s) had taints that the pod didn't tolerate.

# oc -n openshift-monitoring get pod alertmanager-main-0 -oyaml | grep tolerations -A13
  tolerations:
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: config-volume

Delete alertmanager-main-0 pod, then the toleration config is applied
#  oc -n openshift-monitoring delete pod alertmanager-main-0
pod "alertmanager-main-0" deleted

# oc -n openshift-monitoring get pod | grep alertmanager-main
alertmanager-main-0                            3/3     Running   0          45s
alertmanager-main-1                            3/3     Running   0          36s
alertmanager-main-2                            3/3     Running   0          27s

# oc -n openshift-monitoring get pod alertmanager-main-0 -oyaml | grep tolerations -A13
  tolerations:
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  - effect: NoExecute
    key: monitoring
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists

Comment 5 Frederic Branczyk 2019-07-01 12:28:36 UTC
Could you share prometheus-operator logs as well as the statefulset generated by the prometheus-operator?

Comment 6 Junqi Zhao 2019-07-02 02:14:31 UTC
Created attachment 1586479 [details]
monitoring dump

Comment 10 Junqi Zhao 2019-07-02 07:55:24 UTC
Created attachment 1586560 [details]
controller_scheduler logs

Comment 12 Frederic Branczyk 2019-07-02 07:59:26 UTC
That looks like the openshift-controller-manager, could you share the kube-controller-manager logs?

Comment 17 Pawel Krupa 2019-07-26 09:54:56 UTC
Looks like https://github.com/openshift/prometheus-operator/pull/35 fixed things.

Comment 19 Junqi Zhao 2019-07-31 02:02:15 UTC
Follow the steps in Comment 0, alertmanager-main pod now can apply the toleration config
# oc -n openshift-monitoring get pod | grep alertmanager-main
alertmanager-main-0                            3/3     Running   0          55m
alertmanager-main-1                            3/3     Running   0          55m
alertmanager-main-2                            3/3     Running   0          55m

payload: 4.2.0-0.nightly-2019-07-29-154123

Comment 20 errata-xmlrpc 2019-10-16 06:32:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922


Note You need to log in before you can comment on or make changes to this bug.