Bug 1882304

Summary: Must gather pod doesn't run on the master node
Product: OpenShift Container Platform Reporter: Qin Ping <piqin>
Component: ocAssignee: Jan Chaloupka <jchaloup>
Status: CLOSED ERRATA QA Contact: RamaKasturi <knarra>
Severity: low Docs Contact:
Priority: low    
Version: 4.6CC: aos-bugs, jokerman, knarra, maszulik, mfojtik
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:45:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Qin Ping 2020-09-24 09:48:49 UTC
Description of Problem:
Must gather pod doesn't run on the master node

Version-Release number of selected component (if applicable):
$ oc version
Client Version: 4.6.0-0.nightly-2020-09-24-015627
Server Version: 4.6.0-0.nightly-2020-09-23-022756


How Reproducible:
Always


Steps to Reproduce:
oc admin must-gather -h

Actual Results:
Options:
      --dest-dir='': Set a specific directory on the local machine to write gathered data to.
      --image=[]: Specify a must-gather plugin image to run. If not specified, OpenShift's default must-gather image
will be used.
      --image-stream=[]: Specify an image stream (namespace/name:tag) containing a must-gather plugin image to run.
      --node-name='': Set a specific node to use - by default a random master will be used
      --source-dir='/must-gather/': Set the specific directory on the pod copy the gathered data from.
      --timeout=600: The length of time to gather data, in seconds. Defaults to 10 minutes.

Per the help info, if we don't specify the node-name, must-gather pod should be running on a master node, actually it's not.

$ oc get pod must-gather-l4hkr -n openshift-must-gather-n8mss -o json | jq .spec
{
  "containers": [
    {
  <--skip-->
  "nodeName": "ip-10-0-164-189.us-east-2.compute.internal",
  "nodeSelector": {
    "kubernetes.io/os": "linux"
  },
  "preemptionPolicy": "PreemptLowerPriority",
  "priority": 0,
  "restartPolicy": "Never",
  "schedulerName": "default-scheduler",
  "securityContext": {},
  "serviceAccount": "default",
  "serviceAccountName": "default",
  "terminationGracePeriodSeconds": 0,
  "tolerations": [
    {
      "operator": "Exists"
    }
  ],
  <--skip-->
}

$ oc get nodes ip-10-0-164-189.us-east-2.compute.internal
NAME                                         STATUS   ROLES    AGE     VERSION
ip-10-0-164-189.us-east-2.compute.internal   Ready    worker   3h36m   v1.19.0+8a39924

Expected Results:
Ensure the must gather pod is running on master node, or update the help info.

Comment 2 Jan Chaloupka 2020-09-30 10:17:23 UTC
Checking the master HEAD and git history it does not seem there was ever a mechanism to gravitate the must-gather pod(s) towards master nodes by default.

Comment 4 RamaKasturi 2020-10-01 13:23:12 UTC
Verified with the payload below and i see that must-gather runs on a master node when no --node-name is specified.

[knarra@knarra openshift-client-linux-4.6.0-0.nightly-2020-10-01-070841]$ ./oc version
Client Version: 4.6.0-0.nightly-2020-10-01-070841
Server Version: 4.6.0-0.nightly-2020-10-01-041253
Kubernetes Version: v1.19.0+beb741b

[knarra@knarra openshift-client-linux-4.6.0-0.nightly-2020-10-01-070841]$ oc get nodes | grep master
ip-10-0-148-147.us-east-2.compute.internal   Ready    master   114m   v1.19.0+beb741b
ip-10-0-179-122.us-east-2.compute.internal   Ready    master   114m   v1.19.0+beb741b
ip-10-0-198-134.us-east-2.compute.internal   Ready    master   114m   v1.19.0+beb741b
[knarra@knarra openshift-client-linux-4.6.0-0.nightly-2020-10-01-070841]$ ./oc adm must-gather

[knarra@knarra openshift-client-linux-4.6.0-0.nightly-2020-10-01-070841]$ oc get pod must-gather-kfjc5 -n openshift-must-gather-827br -o json | jq .spec
{
  "containers": [
    {
      "command": [
        "/bin/bash",
        "-c",
        "/usr/bin/gather; sync"
      ],
      "image": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3fd026964eb4eb754fc6c28e241bc54f61f71c084424549488b34ecb7c86ba7f",
      "imagePullPolicy": "IfNotPresent",
      "name": "gather",
      "resources": {},
      "terminationMessagePath": "/dev/termination-log",
      "terminationMessagePolicy": "File",
      "volumeMounts": [
        {
          "mountPath": "/must-gather",
          "name": "must-gather-output"
        },
        {
          "mountPath": "/var/run/secrets/kubernetes.io/serviceaccount",
          "name": "default-token-z9th7",
          "readOnly": true
        }
      ]
    },
    {
      "command": [
        "/bin/bash",
        "-c",
        "trap : TERM INT; sleep infinity & wait"
      ],
      "image": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3fd026964eb4eb754fc6c28e241bc54f61f71c084424549488b34ecb7c86ba7f",
      "imagePullPolicy": "IfNotPresent",
      "name": "copy",
      "resources": {},
      "terminationMessagePath": "/dev/termination-log",
      "terminationMessagePolicy": "File",
      "volumeMounts": [
        {
          "mountPath": "/must-gather",
          "name": "must-gather-output"
        },
        {
          "mountPath": "/var/run/secrets/kubernetes.io/serviceaccount",
          "name": "default-token-z9th7",
          "readOnly": true
        }
      ]
    }
  ],
  "dnsPolicy": "ClusterFirst",
  "enableServiceLinks": true,
  "imagePullSecrets": [
    {
      "name": "default-dockercfg-lpfsl"
    }
  ],
  "nodeName": "ip-10-0-148-147.us-east-2.compute.internal",
  "nodeSelector": {
    "kubernetes.io/os": "linux",
    "node-role.kubernetes.io/master": ""
  },
  "preemptionPolicy": "PreemptLowerPriority",
  "priority": 0,
  "restartPolicy": "Never",
  "schedulerName": "default-scheduler",
  "securityContext": {},
  "serviceAccount": "default",
  "serviceAccountName": "default",
  "terminationGracePeriodSeconds": 0,
  "tolerations": [
    {
      "operator": "Exists"
    }
  ],
  "volumes": [
    {
      "emptyDir": {},
      "name": "must-gather-output"
    },
    {
      "name": "default-token-z9th7",
      "secret": {
        "defaultMode": 420,
        "secretName": "default-token-z9th7"
      }
    }
  ]
}

when --node-name is specified it runs on that node.

[knarra@knarra openshift-client-linux-4.6.0-0.nightly-2020-10-01-070841]$ ./oc get nodes | grep worker
ip-10-0-132-124.us-east-2.compute.internal   Ready    worker   116m   v1.19.0+beb741b
ip-10-0-181-128.us-east-2.compute.internal   Ready    worker   116m   v1.19.0+beb741b
ip-10-0-196-150.us-east-2.compute.internal   Ready    worker   116m   v1.19.0+beb741b

[knarra@knarra openshift-client-linux-4.6.0-0.nightly-2020-10-01-070841]$ ./oc adm must-gather --node-name=ip-10-0-132-124.us-east-2.compute.internal

[knarra@knarra openshift-client-linux-4.6.0-0.nightly-2020-10-01-070841]$ oc get pod must-gather-hnttn -n openshift-must-gather-wblh7 -o json | jq .spec
{
  "containers": [
    {
      "command": [
        "/bin/bash",
        "-c",
        "/usr/bin/gather; sync"
      ],
      "image": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3fd026964eb4eb754fc6c28e241bc54f61f71c084424549488b34ecb7c86ba7f",
      "imagePullPolicy": "IfNotPresent",
      "name": "gather",
      "resources": {},
      "terminationMessagePath": "/dev/termination-log",
      "terminationMessagePolicy": "File",
      "volumeMounts": [
        {
          "mountPath": "/must-gather",
          "name": "must-gather-output"
        },
        {
          "mountPath": "/var/run/secrets/kubernetes.io/serviceaccount",
          "name": "default-token-dkq6b",
          "readOnly": true
        }
      ]
    },
    {
      "command": [
        "/bin/bash",
        "-c",
        "trap : TERM INT; sleep infinity & wait"
      ],
      "image": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3fd026964eb4eb754fc6c28e241bc54f61f71c084424549488b34ecb7c86ba7f",
      "imagePullPolicy": "IfNotPresent",
      "name": "copy",
      "resources": {},
      "terminationMessagePath": "/dev/termination-log",
      "terminationMessagePolicy": "File",
      "volumeMounts": [
        {
          "mountPath": "/must-gather",
          "name": "must-gather-output"
        },
        {
          "mountPath": "/var/run/secrets/kubernetes.io/serviceaccount",
          "name": "default-token-dkq6b",
          "readOnly": true
        }
      ]
    }
  ],
  "dnsPolicy": "ClusterFirst",
  "enableServiceLinks": true,
  "imagePullSecrets": [
    {
      "name": "default-dockercfg-wpd4q"
    }
  ],
  "nodeName": "ip-10-0-132-124.us-east-2.compute.internal",
  "nodeSelector": {
    "kubernetes.io/os": "linux"
  },
  "preemptionPolicy": "PreemptLowerPriority",
  "priority": 0,
  "restartPolicy": "Never",
  "schedulerName": "default-scheduler",
  "securityContext": {},
  "serviceAccount": "default",
  "serviceAccountName": "default",
  "terminationGracePeriodSeconds": 0,
  "tolerations": [
    {
      "operator": "Exists"
    }
  ],
  "volumes": [
    {
      "emptyDir": {},
      "name": "must-gather-output"
    },
    {
      "name": "default-token-dkq6b",
      "secret": {
        "defaultMode": 420,
        "secretName": "default-token-dkq6b"
      }
    }
  ]
}

When --node-name is specified as one of the master node, pod gets scheduled on that node.

[knarra@knarra openshift-client-linux-4.6.0-0.nightly-2020-10-01-070841]$ ./oc get nodes | grep master
ip-10-0-148-147.us-east-2.compute.internal   Ready    master   130m   v1.19.0+beb741b
ip-10-0-179-122.us-east-2.compute.internal   Ready    master   130m   v1.19.0+beb741b
ip-10-0-198-134.us-east-2.compute.internal   Ready    master   130m   v1.19.0+beb741b
[knarra@knarra openshift-client-linux-4.6.0-0.nightly-2020-10-01-070841]$ ./oc adm must-gather --node-name=ip-10-0-179-122.us-east-2.compute.internal

[knarra@knarra openshift-client-linux-4.6.0-0.nightly-2020-10-01-070841]$ oc get pod must-gather-75gng -n openshift-must-gather-lxc44 -o json | jq .spec
{
  "containers": [
    {
      "command": [
        "/bin/bash",
        "-c",
        "/usr/bin/gather; sync"
      ],
      "image": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3fd026964eb4eb754fc6c28e241bc54f61f71c084424549488b34ecb7c86ba7f",
      "imagePullPolicy": "IfNotPresent",
      "name": "gather",
      "resources": {},
      "terminationMessagePath": "/dev/termination-log",
      "terminationMessagePolicy": "File",
      "volumeMounts": [
        {
          "mountPath": "/must-gather",
          "name": "must-gather-output"
        },
        {
          "mountPath": "/var/run/secrets/kubernetes.io/serviceaccount",
          "name": "default-token-7rtm9",
          "readOnly": true
        }
      ]
    },
    {
      "command": [
        "/bin/bash",
        "-c",
        "trap : TERM INT; sleep infinity & wait"
      ],
      "image": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3fd026964eb4eb754fc6c28e241bc54f61f71c084424549488b34ecb7c86ba7f",
      "imagePullPolicy": "IfNotPresent",
      "name": "copy",
      "resources": {},
      "terminationMessagePath": "/dev/termination-log",
      "terminationMessagePolicy": "File",
      "volumeMounts": [
        {
          "mountPath": "/must-gather",
          "name": "must-gather-output"
        },
        {
          "mountPath": "/var/run/secrets/kubernetes.io/serviceaccount",
          "name": "default-token-7rtm9",
          "readOnly": true
        }
      ]
    }
  ],
  "dnsPolicy": "ClusterFirst",
  "enableServiceLinks": true,
  "imagePullSecrets": [
    {
      "name": "default-dockercfg-ld87g"
    }
  ],
  "nodeName": "ip-10-0-179-122.us-east-2.compute.internal",
  "nodeSelector": {
    "kubernetes.io/os": "linux"
  },
  "preemptionPolicy": "PreemptLowerPriority",
  "priority": 0,
  "restartPolicy": "Never",
  "schedulerName": "default-scheduler",
  "securityContext": {},
  "serviceAccount": "default",
  "serviceAccountName": "default",
  "terminationGracePeriodSeconds": 0,
  "tolerations": [
    {
      "operator": "Exists"
    }
  ],
  "volumes": [
    {
      "emptyDir": {},
      "name": "must-gather-output"
    },
    {
      "name": "default-token-7rtm9",
      "secret": {
        "defaultMode": 420,
        "secretName": "default-token-7rtm9"
      }
    }
  ]
}


Based on the above moving bug to verified state.

Comment 7 errata-xmlrpc 2020-10-27 16:45:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196