Bug 1798549

Summary: oc debug node/foo does not fail quickly on errimagepull
Product: OpenShift Container Platform Reporter: David Eads <deads>
Component: ocAssignee: Sally <somalley>
Status: CLOSED ERRATA QA Contact: zhou ying <yinzhou>
Severity: high Docs Contact:
Priority: high    
Version: 4.4CC: aos-bugs, jokerman, maszulik, mfojtik
Target Milestone: ---   
Target Release: 4.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-04 11:33:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Eads 2020-02-05 15:07:40 UTC
When the image doesn't exist, the image pull fails and will not recover.  The `oc debug node` command should fail quickly when this happens, not hang for minutes on end.

```
apiVersion: v1
kind: Pod
metadata:
  annotations:
    debug.openshift.io/source-container: container-00
    debug.openshift.io/source-resource: /v1, Resource=nodes/ip-10-0-134-191.us-east-2.compute.internal
  creationTimestamp: "2020-02-05T15:03:02Z"
  name: ip-10-0-134-191us-east-2computeinternal-debug
  namespace: default
  resourceVersion: "74856"
  selfLink: /api/v1/namespaces/default/pods/ip-10-0-134-191us-east-2computeinternal-debug
  uid: b46d4d07-796f-4078-81f6-74da492e3157
spec:
  containers:
  - command:
    - /bin/sh
    image: registry.redhat.io/rhel7/support-tools
    imagePullPolicy: Always
    name: container-00
    resources: {}
    securityContext:
      privileged: true
      runAsUser: 0
    stdin: true
    stdinOnce: true
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    tty: true
    volumeMounts:
    - mountPath: /host
      name: host
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-8nmzh
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  hostNetwork: true
  hostPID: true
  imagePullSecrets:
  - name: default-dockercfg-7n99x
  nodeName: ip-10-0-134-191.us-east-2.compute.internal
  priority: 0
  restartPolicy: Never
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - hostPath:
      path: /
      type: Directory
    name: host
  - name: default-token-8nmzh
    secret:
      defaultMode: 420
      secretName: default-token-8nmzh
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2020-02-05T15:03:02Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2020-02-05T15:03:02Z"
    message: 'containers with unready status: [container-00]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2020-02-05T15:03:02Z"
    message: 'containers with unready status: [container-00]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2020-02-05T15:03:02Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - image: registry.redhat.io/rhel7/support-tools
    imageID: ""
    lastState: {}
    name: container-00
    ready: false
    restartCount: 0
    started: false
    state:
      waiting:
        message: Back-off pulling image "registry.redhat.io/rhel7/support-tools"
        reason: ImagePullBackOff
  hostIP: 10.0.134.191
  phase: Pending
  podIP: 10.0.134.191
  podIPs:
  - ip: 10.0.134.191
  qosClass: BestEffort
  startTime: "2020-02-05T15:03:02Z"

```

At a certain number of events failing, we should simply fail and let the user decide what to do

```
2m12s       Normal    Pulling   pod/ip-10-0-134-191us-east-2computeinternal-debug   spec.containers{container-00}   kubelet, ip-10-0-134-191.us-east-2.compute.internal   Pulling image "registry.redhat.io/rhel7/support-tools"                                                                                                                                                                                                                                                                                          3m31s        4       ip-10-0-134-191us-east-2computeinternal-debug.15f089cd312e1d75
2m12s       Warning   Failed    pod/ip-10-0-134-191us-east-2computeinternal-debug   spec.containers{container-00}   kubelet, ip-10-0-134-191.us-east-2.compute.internal   Failed to pull image "registry.redhat.io/rhel7/support-tools": rpc error: code = Unknown desc = unable to retrieve auth token: invalid username/password: unauthorized: Please login to the Red Hat Registry using your Customer Portal credentials. Further instructions can be found here: https://access.redhat.com/RegistryAuthentication   3m31s        4       ip-10-0-134-191us-east-2computeinternal-debug.15f089cd44414d30
2m12s       Warning   Failed    pod/ip-10-0-134-191us-east-2computeinternal-debug   spec.containers{container-00}   kubelet, ip-10-0-134-191.us-east-2.compute.internal   Error: ErrImagePull                                                                                                                                                                                                                                                                                                                             3m31s        4       ip-10-0-134-191us-east-2computeinternal-debug.15f089cd4441cde1
104s        Normal    BackOff   pod/ip-10-0-134-191us-east-2computeinternal-debug   spec.containers{container-00}   kubelet, ip-10-0-134-191.us-east-2.compute.internal   Back-off pulling image "registry.redhat.io/rhel7/support-tools"                                                                                                                                                                                                                                                                                 3m30s        6       ip-10-0-134-191us-east-2computeinternal-debug.15f089cd6560d425
89s         Warning   Failed    pod/ip-10-0-134-191us-east-2computeinternal-debug   spec.containers{container-00}   kubelet, ip-10-0-134-191.us-east-2.compute.internal   Error: ImagePullBackOff                                                                                                                                                                                                                                                                                                                         3m30s        7       ip-10-0-134-191us-east-2computeinternal-debug.15f089cd65611568

```

Comment 1 Sally 2020-02-11 20:52:27 UTC
opened https://github.com/openshift/oc/pull/277  for multiple bzs, including this

Comment 3 zhou ying 2020-02-18 09:53:19 UTC
Confirmed with latest oc client, can't reproduce the issue new:

[root@dhcp-140-138 ~]# oc version -o yaml 
clientVersion:
  buildDate: "2020-02-14T07:28:29Z"
  compiler: gc
  gitCommit: 5d7a12f03389b03b651f963cb5ee8ddfa9cff559
  gitTreeState: clean
  gitVersion: v4.4.0
  goVersion: go1.13.4
  major: ""
  minor: ""
  platform: linux/amd64


[root@dhcp-140-138 ~]# oc debug node/yinzho-xxxx
Starting pod/yinzho-xxx ...
To use host binaries, run `chroot /host`

Removing debug pod ...
error: Back-off pulling image "registry.redhat.io/rhel7/support-tools"

Comment 5 errata-xmlrpc 2020-05-04 11:33:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581