Currently, it is possible for thanos-querier to be scheduled both on master and worker nodes as we only have master tolerations set. We should also set master node-selector so thanos querier is guaranteed to be deployed on master nodes.
Tested on 4.5.0-0.nightly-2020-03-18-115438, thanos-querier pods are deployed on workers now, and can configure nodeSelector and tolerations for thanos-querier via cluster-monitoring-config configmap verification steps: 1. thanos-querier pods are deployed on workers now # oc get node | grep worker ip-10-0-134-182.ap-northeast-2.compute.internal Ready worker 9h v1.17.1 ip-10-0-150-200.ap-northeast-2.compute.internal Ready worker 9h v1.17.1 ip-10-0-173-240.ap-northeast-2.compute.internal Ready worker 9h v1.17.1 # oc -n openshift-monitoring get pod -o wide | grep thanos-querier thanos-querier-56f9d46b78-gzd5n 4/4 Running 0 9h 10.128.2.9 ip-10-0-173-240.ap-northeast-2.compute.internal <none> <none> thanos-querier-56f9d46b78-j4rpr 4/4 Running 0 9h 10.129.2.13 ip-10-0-150-200.ap-northeast-2.compute.internal <none> <none> 2. label master nodes # for i in $(oc get node | grep master | awk '{print $1}'); do echo $i; oc label node $i thanosQuerier=deploy;done 3. add nodeSelector and tolerations in cluster-monitoring-config configmap, so that they can deploy on master nodes **** apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | thanosQuerier: nodeSelector: thanosQuerier: deploy tolerations: - key: node-role.kubernetes.io/master operator: Exists effect: NoSchedule **** 4. Check the configuration is in deploy thanos-querier and pods are scheduled to master nodes despite of the NoSchedule taint, and thanos-querier pods work well # oc -n openshift-monitoring get deploy thanos-querier -oyaml ... nodeSelector: thanosQuerier: deploy ... tolerations: - effect: NoSchedule key: node-role.kubernetes.io/master operator: Exists .. # oc get node | grep master ip-10-0-129-123.ap-northeast-2.compute.internal Ready master 10h v1.17.1 ip-10-0-150-17.ap-northeast-2.compute.internal Ready master 10h v1.17.1 ip-10-0-171-6.ap-northeast-2.compute.internal Ready master 10h v1.17.1 # oc -n openshift-monitoring get pod -o wide | grep thanos-querier thanos-querier-674cbbb6c7-9hzlt 4/4 Running 0 3m36s 10.129.0.50 ip-10-0-129-123.ap-northeast-2.compute.internal <none> <none> thanos-querier-674cbbb6c7-n645t 4/4 Running 0 3m4s 10.130.0.50 ip-10-0-171-6.ap-northeast-2.compute.internal <none> <none>
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.5 image release advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409