Bug 1807852

Summary: nodeSelector for openshift-state-metrics pod should be configurable.
Product: OpenShift Container Platform Reporter: Masaki Furuta ( RH ) <mfuruta>
Component: MonitoringAssignee: Pawel Krupa <pkrupa>
Status: CLOSED NOTABUG QA Contact: Junqi Zhao <juzhao>
Severity: low Docs Contact:
Priority: medium    
Version: 4.2.0CC: alegrand, anpicker, aos-bugs, erooth, gotoutkq, jokerman, juzhao, kakkoyun, lcosic, mloibl, msvistun, pkrupa, surbania, vigoyal
Target Milestone: ---   
Target Release: 4.2.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1789305 Environment:
Last Closed: 2020-02-27 14:17:04 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1789305    
Bug Blocks: 1808183, 1826181    

Comment 1 Masaki Furuta ( RH ) 2020-02-27 11:32:34 UTC
Description of problem:

  nodeSelector for openshift-state-metrics pod should be configurable.

Version-Release number of selected component (if applicable):

  openshift v4.2

How reproducible:

  Always

Steps to Reproduce:

    0. Try to move monitoring pod as on comment #5, but got partially success , and partially failure.
    1. Try to move rest of remaining pods on worker nodes by followed steps suggested by  Richard Vanderpool at private comment #6.
    2. Delete the pod with "oc delete pod openshift-state-metrics-xxxxxxxxxx-xxxxx -n openshift-monitoring" accordingly, but the pod was started again on the worker node and did not move to the infra node.
    3. At this time, when checked the pod spec.nodeSelector of openshift-state-metrics-xxxxxxxxxx-xxxxx, it was set as `spec.nodeSelector.kubernetes.io / os = linux`.
    4. When checked another pod which is assigned to the infra node as expected, I found that it was set as `spec.nodeSelector.node-role.kubernetes.io/infra =" "`.

Actual results:
    
    As above in the Problem description, referring to the procedure of [1],  added two worker nodes and changed them to infra node.

       [1] https://docs.openshift.com/container-platform/4.2/machine_management/creating-infrastructure-machinesets.html#infrastructure-moving-monitoring_creating-infrastructure-machinesets

   I tried to move Openshift-state-metrics-xxxxxxxxxx-xxxxx pod from worker node to infra node, but failed to launch it on the infra node as above. 
  Would you please tell me how to deal with this case.
    
    What else the customer noticed is that when I checked the ConfigMap definition of monitoring in [1] after this failure, there was a setting for kubeStateMetrics. They think this is for  “kube” -state-metrics-xxxxxxxxxx-xxxxx psods.
    However, in this case, they also found that there is no definition which is for the "openshift" -state-metrics-xxxxxxxxxx-xxxx Pods. Is this relevant to this issue or nothing to do with ?
    
    `` `
    Save the following ConfigMap definition as the cluster-monitoring-configmap.yaml file:
    -
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: cluster-monitoring-config
      namespace: openshift-monitoring
    data:
      config.yaml: | +
        alertmanagerMain:
          nodeSelector:
            node-role.kubernetes.io/infra: ""
        prometheusK8s:
          nodeSelector:
            node-role.kubernetes.io/infra: ""
        prometheusOperator:
          nodeSelector:
            node-role.kubernetes.io/infra: ""
        grafana:
          nodeSelector:
            node-role.kubernetes.io/infra: ""
        k8sPrometheusAdapter:
          nodeSelector:
            node-role.kubernetes.io/infra: ""
        kubeStateMetrics:  ----> Definition for kube-state-metrics-xxxxxxxxxx-xxxxx? ?
          nodeSelector:
            node-role.kubernetes.io/infra: ""
        telemeterClient:
          nodeSelector:
            node-role.kubernetes.io/infra: ""
        
  ----> But,  where's definision for openshift-state-metrics-xxxxxxxxxx-xxxx , isn't that needed ? ?
        (Is the content of the manual incomplete, or just it's not needed ??)
    `` `

Expected results:

   To achieve documented steps at https://docs.openshift.com/container-platform/4.2/machine_management/creating-infrastructure-machinesets.html#infrastructure-moving-monitoring_creating-infrastructure-machinesets,     nodeSelector for openshift-state-metrics pod should be configurable.

Additional info:

    1. Procedure.txt (14.3 kB)
    
    2. Openshift-monitoring__oc_get_pod_openshift-state-metrics-65488cbc6f-zf5g7.txt (7.6 kB)
       oc get pod openshift-state-metrics -o yaml result
    
    3. Openshift-monitoring__oc_get_pod_alertmanager-main-0.txt (9.9 kB)
       oc get pod alertmanager-main-0 -o yaml result (attached as reference of Pod that starts normally in infra node)
    
    4. Openshift-monitoring__oc_get_all.txt (29.2 kB)
       oc get all results

Comment 6 Pawel Krupa 2020-02-27 14:17:04 UTC
This feature is supported since 4.2 (or maybe earlier). You need to add:

openshiftStateMetrics:
  nodeSelector:
    node-role.kubernetes.io/infra: ""

to cluster-monitoring-config ConfigMap. The same way as for other components.