Description of problem: In OpenShift Container Platform 4.10, there is the new feature to Debug a Pod in the Web Console. On a Pod that is in "CrashLoopBackOff", customers can click on "Debug container <NAME>" and then get a debug Pod. However, on multiple clusters this fails with "pods "fedora-6b5c67b55c-gdqlk-debug-cpq2b" not found" or "The debug pod failed.". When checking the Pods in the namespace, it can be seen that the "debug" Pod is started but it immediately terminates: ~~~ $ oc get pods -w NAME READY STATUS RESTARTS AGE fedora-6b5c67b55c-gdqlk 0/1 CrashLoopBackOff 4 (14s ago) 94s fedora-6b5c67b55c-gdqlk-debug-6rnt2 0/1 Pending 0 0s fedora-6b5c67b55c-gdqlk-debug-6rnt2 0/1 Pending 0 0s fedora-6b5c67b55c-gdqlk-debug-6rnt2 0/1 Terminating 0 0s ~~~ Events show the same behaviour: ~~~ $ oc get events -w LAST SEEN TYPE REASON OBJECT MESSAGE 0s Normal Scheduled pod/fedora-6b5c67b55c-gdqlk-debug-bgsfz Successfully assigned fedora/fedora-6b5c67b55c-gdqlk-debug-bgsfz to ip-10-0-133-207.eu-central-1.compute.internal 0s Normal SuccessfulDelete replicaset/fedora-6b5c67b55c Deleted pod: fedora-6b5c67b55c-gdqlk-debug-bgsfz 0s Normal AddedInterface pod/fedora-6b5c67b55c-gdqlk-debug-bgsfz Add eth0 [10.129.2.16/23] from openshift-sdn 0s Normal Pulled pod/fedora-6b5c67b55c-gdqlk-debug-bgsfz Container image "registry.fedoraproject.org/fedora:35" already present on machine 0s Normal Created pod/fedora-6b5c67b55c-gdqlk-debug-bgsfz Created container fedora 0s Normal Started pod/fedora-6b5c67b55c-gdqlk-debug-bgsfz Started container fedora 0s Normal Killing pod/fedora-6b5c67b55c-gdqlk-debug-bgsfz Stopping container fedora ~~~ In the kube-apiserver we can see some of the following errors which may or may not be related: ~~~ I0324 10:43:42.349455 16 node_authorizer.go:203] "NODE DENY" err="node 'ip-10-0-133-207.eu-central-1.compute.internal' cannot get configmap fedora/kube-root-ca.crt, no relationship to this object was found in the node authorizer graph" I0324 10:43:42.349612 16 node_authorizer.go:203] "NODE DENY" err="node 'ip-10-0-133-207.eu-central-1.compute.internal' cannot get configmap fedora/openshift-service-ca.crt, no relationship to this object was found in the node authorizer graph" I0324 10:43:42.349707 16 node_authorizer.go:203] "NODE DENY" err="node 'ip-10-0-133-207.eu-central-1.compute.internal' cannot get secret fedora/default-dockercfg-fcs76, no relationship to this object was found in the node authorizer graph" E0324 10:44:18.969927 16 apiaccess_count_controller.go:161] invalid resource name ".": [may not be '.'] ~~~ Version-Release number of selected component (if applicable): $ oc version Client Version: 4.10.3 Server Version: 4.10.6 Kubernetes Version: v1.23.5+b0357ed How reproducible: Always Steps to Reproduce: 1. Create a Deployment that will CrashLoopBackOff. For example, use the following definition: ~~~ apiVersion: apps/v1 kind: Deployment metadata: name: fedora labels: app: fedora spec: replicas: 1 selector: matchLabels: app: fedora template: metadata: labels: app: fedora spec: containers: - image: registry.fedoraproject.org/fedora:35 name: fedora command: ['cat','/this-file-does-not-exist'] ~~~ 2. In the Web Console, navigate to "Workloads" -> "Pods" and then locate the Pod that is in "CrashLoopBackOff" 3. Click on the "CrashLoopBackOff" status in the list and click on "Debug container fedora" Actual results: The new page is opened and after some time fails with "pods "fedora-6b5c67b55c-gdqlk-debug-kzsw8" not found" or "The debug pod failed." Expected results: Debug Terminal is shown Additional info: - Reproduced it with OpenShift Container Platform 4.10.6 - May or may not be related to Bug 2065672
This is the same issue as https://bugzilla.redhat.com/show_bug.cgi?id=2064744. I will leave this open as it refers to 4.10 and is customer specific. As for the issue, for some reason, pods inside of certain deployments are immediately closing but if I simply create the pod directly, then the debug feature works. Still investigating.
We have a fix created with PR https://github.com/openshift/console/pull/11229. Closing this as duplicate and we will update the 4.10 branch after the PR merges. *** This bug has been marked as a duplicate of bug 2064744 ***