Description of problem: It is confusing to figure out which mcd to check logs in as the message doesn't explicitly list the hash: `oc logs -f -n openshift-machine-config-operator machine-config-daemon-<hash>` and makes troubleshooting more difficult. Expected results: let me know which pod to check
Created attachment 1780869 [details] Verification Verification
Verified on 4.8.0-0.nightly-2021-05-07-075528. Triggered the mcd_drain_err with the steps below, then looked at Prometheus. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-05-07-075528 True False 67m Cluster version is 4.8.0-0.nightly-2021-05-07-075528 $$ cd openshift/ $ cat pdb.yaml apiVersion: policy/v1beta1 kind: PodDisruptionBudget metadata: name: dontevict spec: minAvailable: 1 selector: matchLabels: app: dontevict $ oc create -f pdb.yaml poddisruptionbudget.policy/dontevict created $ oc get pdb NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE dontevict 1 N/A 0 7s $ oc get nodes NAME STATUS ROLES AGE VERSION ci-ln-9wr6012-f76d1-z7bjv-master-0 Ready master 89m v1.21.0-rc.0+291e731 ci-ln-9wr6012-f76d1-z7bjv-master-1 Ready master 89m v1.21.0-rc.0+291e731 ci-ln-9wr6012-f76d1-z7bjv-master-2 Ready master 89m v1.21.0-rc.0+291e731 ci-ln-9wr6012-f76d1-z7bjv-worker-b-gwn8c Ready worker 80m v1.21.0-rc.0+291e731 ci-ln-9wr6012-f76d1-z7bjv-worker-c-c2ndb Ready worker 80m v1.21.0-rc.0+291e731 ci-ln-9wr6012-f76d1-z7bjv-worker-d-2sc2x Ready worker 80m v1.21.0-rc.0+291e731 $ oc run --restart=Never --labels app=dontevict --overrides='{ "spec": { "nodeSelector": { "kubernetes.io/hostname": "ci-ln-9wr6012-f76d1-z7bjv-worker-b-gwn8c"} } }' --image=docker.io/busybox dont-evict-this-pod -- sleep 1h pod/dont-evict-this-pod created $ oc get pods NAME READY STATUS RESTARTS AGE dont-evict-this-pod 1/1 Running 0 7s $ cat file-ig3.yaml apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: test-file spec: config: ignition: version: 3.1.0 storage: files: - contents: source: data:text/plain;charset=utf;base64,c2VydmVyIGZvby5leGFtcGxlLm5ldCBtYXhkZWxheSAwLjQgb2ZmbGluZQpzZXJ2ZXIgYmFyLmV4YW1wbGUubmV0IG1heGRlbGF5IDAuNCBvZmZsaW5lCnNlcnZlciBiYXouZXhhbXBsZS5uZXQgbWF4ZGVsYXkgMC40IG9mZmxpbmUK filesystem: root mode: 0644 path: /etc/test $ oc create -f file-ig3.yaml machineconfig.machineconfiguration.openshift.io/test-file created $ oc get nodes NAME STATUS ROLES AGE VERSION ci-ln-9wr6012-f76d1-z7bjv-master-0 Ready master 100m v1.21.0-rc.0+291e731 ci-ln-9wr6012-f76d1-z7bjv-master-1 Ready master 100m v1.21.0-rc.0+291e731 ci-ln-9wr6012-f76d1-z7bjv-master-2 Ready master 100m v1.21.0-rc.0+291e731 ci-ln-9wr6012-f76d1-z7bjv-worker-b-gwn8c Ready,SchedulingDisabled worker 91m v1.21.0-rc.0+291e731 ci-ln-9wr6012-f76d1-z7bjv-worker-c-c2ndb Ready worker 91m v1.21.0-rc.0+291e731 ci-ln-9wr6012-f76d1-z7bjv-worker-d-2sc2x Ready worker 91m v1.21.0-rc.0+291e731 $ oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-617f5a3d5cb1e6c6b2e34b8a7294d683 True False False 3 3 3 0 98m worker rendered-worker-15776fcc21358742a1f4cb79346b7d50 False True False 3 1 1 0 98m $ oc -n openshift-monitoring get routes NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD alertmanager-main alertmanager-main-openshift-monitoring.apps.ci-ln-9wr6012-f76d1.origin-ci-int-gce.dev.openshift.com alertmanager-main web reencrypt/Redirect None grafana grafana-openshift-monitoring.apps.ci-ln-9wr6012-f76d1.origin-ci-int-gce.dev.openshift.com grafana https reencrypt/Redirect None prometheus-k8s prometheus-k8s-openshift-monitoring.apps.ci-ln-9wr6012-f76d1.origin-ci-int-gce.dev.openshift.com prometheus-k8s web reencrypt/Redirect None thanos-querier thanos-querier-openshift-monitoring.apps.ci-ln-9wr6012-f76d1.origin-ci-int-gce.dev.openshift.com thanos-querier web reencrypt/Redirect None $ oc -n openshift-console get routes NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD console console-openshift-console.apps.ci-ln-9wr6012-f76d1.origin-ci-int-gce.dev.openshift.com console https reencrypt/Redirect None downloads downloads-openshift-console.apps.ci-ln-9wr6012-f76d1.origin-ci-int-gce.dev.openshift.com downloads http edge/Redirect None $ oc get pods -A --field-selector spec.nodeName=ci-ln-9wr6012-f76d1-z7bjv-worker-b-gwn8c NAMESPACE NAME READY STATUS RESTARTS AGE default dont-evict-this-pod 1/1 Running 0 13m openshift-cluster-csi-drivers gcp-pd-csi-driver-node-l9sm4 3/3 Running 0 94m openshift-cluster-node-tuning-operator tuned-qt6d5 1/1 Running 0 94m openshift-dns dns-default-hl5bn 2/2 Running 0 93m openshift-dns node-resolver-v7vdx 1/1 Running 0 94m openshift-image-registry node-ca-r4mnv 1/1 Running 0 94m openshift-ingress-canary ingress-canary-knnbf 1/1 Running 0 93m openshift-machine-config-operator machine-config-daemon-zgc8w 2/2 Running 0 94m openshift-monitoring node-exporter-qkzkz 2/2 Running 0 94m openshift-multus multus-pv4px 1/1 Running 0 94m openshift-multus network-metrics-daemon-zk2vz 2/2 Running 0 94m openshift-network-diagnostics network-check-target-wkr6m 1/1 Running 0 94m openshift-sdn sdn-dnb8q 2/2 Running 0 94m $ oc -n openshift-machine-config-operator logs machine-config-daemon-zgc8w -c machine-config-daemon E0507 18:59:57.986704 1970 daemon.go:330] WARNING: deleting Pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet: default/dont-evict-this-pod; ignoring DaemonSet-managed Pods: openshift-cluster-csi-drivers/gcp-pd-csi-driver-node-l9sm4, openshift-cluster-node-tuning-operator/tuned-qt6d5, openshift-dns/dns-default-hl5bn, openshift-dns/node-resolver-v7vdx, openshift-image-registry/node-ca-r4mnv, openshift-ingress-canary/ingress-canary-knnbf, openshift-machine-config-operator/machine-config-daemon-zgc8w, openshift-monitoring/node-exporter-qkzkz, openshift-multus/multus-pv4px, openshift-multus/network-metrics-daemon-zk2vz, openshift-network-diagnostics/network-check-target-wkr6m, openshift-sdn/sdn-dnb8q I0507 18:59:57.992498 1970 daemon.go:330] evicting pod default/dont-evict-this-pod E0507 18:59:58.001489 1970 daemon.go:330] error when evicting pods/"dont-evict-this-pod" -n "default" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. I0507 19:00:03.004519 1970 daemon.go:330] evicting pod default/dont-evict-this-pod E0507 19:00:03.012675 1970 daemon.go:330] error when evicting pods/"dont-evict-this-pod" -n "default" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. I0507 19:00:08.013652 1970 daemon.go:330] evicting pod default/dont-evict-this-pod E0507 19:00:08.023949 1970 daemon.go:330] error when evicting pods/"dont-evict-this-pod" -n "default" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. I0507 19:00:13.027718 1970 daemon.go:330] evicting pod default/dont-evict-this-pod E0507 19:00:13.038685 1970 daemon.go:330] error when evicting pods/"dont-evict-this-pod" -n "default" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. I0507 19:00:18.042846 1970 daemon.go:330] evicting pod default/dont-evict-this-pod E0507 19:00:18.054457 1970 daemon.go:330] error when evicting pods/"dont-evict-this-pod" -n "default" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. I0507 19:00:23.055472 1970 daemon.go:330] evicting pod default/dont-evict-this-pod E0507 19:00:23.067515 1970 daemon.go:330] error when evicting pods/"dont-evict-this-pod" -n "default" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. I0507 19:00:28.071655 1970 daemon.go:330] evicting pod default/dont-evict-this-pod E0507 19:00:28.081238 1970 daemon.go:330] error when evicting pods/"dont-evict-this-pod" -n "default" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. I0507 19:00:33.082191 1970 daemon.go:330] evicting pod default/dont-evict-this-pod E0507 19:00:33.092854 1970 daemon.go:330] error when evicting pods/"dont-evict-this-pod" -n "default" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. I0507 19:00:38.097446 1970 daemon.go:330] evicting pod default/dont-evict-this-pod E0507 19:00:38.105617 1970 daemon.go:330] error when evicting pods/"dont-evict-this-pod" -n "default" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. I0507 19:00:43.108816 1970 daemon.go:330] evicting pod default/dont-evict-this-pod E0507 19:00:43.118708 1970 daemon.go:330] error when evicting pods/"dont-evict-this-pod" -n "default" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438