Bug 1800920

Summary: oc describe pod omits toleration
Product: OpenShift Container Platform Reporter: Chet Hosey <ChetRHosey>
Component: ocAssignee: Jan Chaloupka <jchaloup>
Status: CLOSED ERRATA QA Contact: zhou ying <yinzhou>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.2.zCC: aos-bugs, jokerman, mfojtik
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 15:55:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Chet Hosey 2020-02-09 06:28:34 UTC
Description of problem:

`oc describe pod` isn't listing all tolerations for node-exporter pods.

The node-exporter DaemonSet in openshift-monitoring specifies:

    # oc get ds node-exporter -n openshift-monitoring -o json | jq .spec.template.spec.tolerations
    [
      {
        "operator": "Exists"
      }
    ]

Generated pods have several tolerations. `oc get pod -o yaml` shows eight, including the one above. `oc describe pod` only shows seven. It omits this one, which can be misleading.

Version-Release number of selected component (if applicable):

    # oc version
    Client Version: v4.2.15
    Server Version: 4.2.16
    Kubernetes Version: v1.14.6+8bbaf43

How reproducible:

Easily!


Steps to Reproduce:
1. oc describe $(oc get pods -n openshift-monitoring -o name | grep node-exporter- | head -1) -n openshift-monitoring | grep -A10 Tolerations

Actual results:

Tolerations list from `oc describe` includes only 7 entries.

Expected results:

List should mention { "operator": "Exists" } entry.


Additional info:

The generated pod has a total of 8 tolerations, but `oc describe` only shows seven.

    # oc describe $(oc get pods -n openshift-monitoring -o name | grep node-exporter- | head -1) -n openshift-monitoring | grep -A10 Tolerations
    Tolerations:
                     node.kubernetes.io/disk-pressure:NoSchedule
                     node.kubernetes.io/memory-pressure:NoSchedule
                     node.kubernetes.io/network-unavailable:NoSchedule
                     node.kubernetes.io/not-ready:NoExecute
                     node.kubernetes.io/pid-pressure:NoSchedule
                     node.kubernetes.io/unreachable:NoExecute
                     node.kubernetes.io/unschedulable:NoSchedule
    Events:
      Type    Reason     Age   From                                        Message
      ----    ------     ----  ----                                        -------


    # oc get pod node-exporter-mfdbf -o json | jq .spec.tolerations
    [
      {
        "effect": "NoSchedule",
        "key": "node.kubernetes.io/memory-pressure",
        "operator": "Exists"
      },
      {
        "effect": "NoSchedule",
        "key": "node.kubernetes.io/pid-pressure",
        "operator": "Exists"
      },
      {
        "effect": "NoSchedule",
        "key": "node.kubernetes.io/unschedulable",
        "operator": "Exists"
      },
      {
        "effect": "NoSchedule",
        "key": "node.kubernetes.io/network-unavailable",
        "operator": "Exists"
      },
      {
        "operator": "Exists"
      },
      {
        "effect": "NoExecute",
        "key": "node.kubernetes.io/not-ready",
        "operator": "Exists"
      },
      {
        "effect": "NoExecute",
        "key": "node.kubernetes.io/unreachable",
        "operator": "Exists"
      },
      {
        "effect": "NoSchedule",
        "key": "node.kubernetes.io/disk-pressure",
        "operator": "Exists"
      }
    ]

Comment 1 Maciej Szulik 2020-02-20 18:01:03 UTC
Moving this to 4.5, for now.

Comment 2 Maciej Szulik 2020-05-11 10:25:28 UTC
Jan, this will require upstream fix, when you have it open feel free to move this BZ to 4.6, although we might consider bringing in some of the upstream fixes in a bigger batch like we did last time. 
It's up to you, how many they will be.

Comment 3 Jan Chaloupka 2020-05-12 14:03:14 UTC
In order for a toleration to get printed, each toleration has to have at least .Value or .Effect field set. Currently, .Operator field is not taken into account.

Upstream documentation at https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/ says:
```
An empty key with operator Exists matches all keys, values and effects which means this will tolerate everything.
tolerations:
- operator: "Exists"

An empty effect matches all effects with key key.
tolerations:
- key: "key"
  operator: "Exists"
```

Thus, displaying {"operator": "Exists"} is still valid.

Additionally, the documentation says:
```
The default value for operator is Equal.

A toleration “matches” a taint if the keys are the same and the effects are the same, and:

the operator is Exists (in which case no value should be specified), or
the operator is Equal and the values are equal.
```

Thus, it's ok to skip printing the operator in already displayed tolerations as is now:
```
Tolerations:
   node.kubernetes.io/disk-pressure:NoSchedule
   node.kubernetes.io/memory-pressure:NoSchedule
   node.kubernetes.io/network-unavailable:NoSchedule
   node.kubernetes.io/not-ready:NoExecute
   node.kubernetes.io/pid-pressure:NoSchedule
   node.kubernetes.io/unreachable:NoExecute
   node.kubernetes.io/unschedulable:NoSchedule
```

Though, in the case of {"operator": "Exists"} we might just display
```
Tolerations:
   op=Exists
```

Comment 4 Jan Chaloupka 2020-05-12 14:10:47 UTC
Upstream PR: https://github.com/kubernetes/kubernetes/pull/91024

Comment 5 Jan Chaloupka 2020-06-18 09:20:03 UTC
Waiting for the next oc rebase

Comment 6 Jan Chaloupka 2020-08-04 16:09:42 UTC
Resolved through https://github.com/openshift/oc/pull/491

Comment 9 zhou ying 2020-08-06 01:43:30 UTC
[root@dhcp-140-138 ~]# oc get po/node-exporter-gbzzp -n openshift-monitoring -o json | jq .spec.tolerations
[
  {
    "operator": "Exists"
  }
]
[root@dhcp-140-138 ~]# oc describe $(oc get pods -n openshift-monitoring -o name | grep node-exporter- | head -1) -n openshift-monitoring | grep -A10 Tolerations
Tolerations:     op=Exists


[root@dhcp-140-138 ~]# oc version --client -o yaml 
clientVersion:
  buildDate: "2020-08-03T19:18:12Z"
  compiler: gc
  gitCommit: a695d74ef1aee9a3f605d38dd8b6fae2062b63fc
  gitTreeState: clean
  gitVersion: 4.6.0-202008031851.p0-a695d74
  goVersion: go1.13.4
  major: ""
  minor: ""
  platform: linux/amd64

So , will verify it.

Comment 11 errata-xmlrpc 2020-10-27 15:55:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196