Description of problem: When trying to run must-gather cnv image, and using the wrong image path (can be due to a typo), the log doesn't properly indicate that and instead you will say a timeout failure Version-Release number of selected component (if applicable): 2.3.0-45 How reproducible: 100% Steps to Reproduce: 1. Run must gather with an image which will cause InspectFailed error 2.let it run until it indicates a failure Actual results: executed this on a disconnected env: oc adm must-gather --image=registry.redhat.io/container-native-virtualization-cnv/must-gather-rhel8:v2.3.0-45 [must-gather ] OUT Using must-gather plugin-in image: registry.redhat.io/container-native-virtualization-cnv/must-gather-rhel8:v2.3.0-45 [must-gather ] OUT namespace/openshift-must-gather-xhgkc created [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-zqlgv created [must-gather ] OUT pod for plug-in image registry.redhat.io/container-native-virtualization-cnv/must-gather-rhel8:v2.3.0-45 created [must-gather-5hcxc] OUT gather did not start: timed out waiting for the condition [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-zqlgv deleted [must-gather ] OUT namespace/openshift-must-gather-xhgkc deleted error: gather did not start for pod must-gather-5hcxc: timed out waiting for the condition Expected results: Instead of a TO I would have liked to get an error message which indicates that must-gather wasn't even able to start Additional info: While running must-gather, if you go to the created must-gather NS, get the pods or describe them, it is possible to see the error, but once must-gather command fails, that NS is removed, so we lose the ability to understand the issue if you do describe, it is very easy to understand where the problem is. This data should be reflected in the must-gather command execution Example: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled <unknown> default-scheduler Successfully assigned openshift-must-gather-c9kdk/must-gather-dq9gq to openshift-worker-1 Warning InspectFailed 7s (x9 over 82s) kubelet, openshift-worker-1 Failed to apply default image tag "registry=registry.redhat.io/container-native-virtualization-cnv/must-gather-rhel8:v2.3.0-45": couldn't parse image reference "registry=registry.redhat.io/container-native-virtualization-cnv/must-gather-rhel8:v2.3.0-45": invalid reference format Warning Failed 7s (x9 over 82s) kubelet, openshift-worker-1 Error: InvalidImageName I do want to mention that in some cases, the error is populated, so this is probably a corner case which wasn't handle examples for good outputs i do see: 1. error: gather did not start for pod must-gather-97nsp: unable to pull image: ErrImagePull: rpc error: code = Unknown desc = error pinging docker registry registry.bla.com:5000: Get https://registry.bla.com:5000/v2/: dial tcp: lookup registry.bla.com on 10.46.29.133:53: no such host 2. error: gather did not start for pod must-gather-m6xpg: unable to pull image: ErrImagePull: rpc error: code = Unknown desc = Error reading manifest latest in registry.ocp-edge.lab.eng.tlv2.redhat.com:5000/container-native-virtualization/cnv-must-gather-rhel9: manifest unknown: manifest unknown 3. error: gather did not start for pod must-gather-6qhns: unable to pull image: ErrImagePull: rpc error: code = Unknown desc = unable to retrieve auth token: invalid username/password: unauthorized: Please login to the Red Hat Registry using your Customer Portal credentials. Further instructions can be found here: https://access.redhat.com/RegistryAuthentication
Verified with the payload below and i do not see a Timeout waiting for condition instead saw "ImagePullBackOff error as below" as described in the patch "Treat ImagePullBackOff and InvalidImageName the same as ErrImagePull" [ramakasturinarra@dhcp35-60 openshift-client-linux-4.6.0-0.nightly-2020-09-27-075304]$ ./oc version Client Version: 4.6.0-0.nightly-2020-09-27-075304 Server Version: 4.6.0-0.nightly-2020-09-27-075304 Kubernetes Version: v1.19.0+e465e66 [ramakasturinarra@dhcp35-60 openshift-client-linux-4.6.0-0.nightly-2020-09-27-075304]$ ./oc adm must-gather --image=registry.redhat.io/container-native-virtualization-cnv/must-gather-rhel8:v2.3.0-45 [must-gather ] OUT Using must-gather plugin-in image: registry.redhat.io/container-native-virtualization-cnv/must-gather-rhel8:v2.3.0-45 [must-gather ] OUT namespace/openshift-must-gather-5c5sb created [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-qjz7b created [must-gather ] OUT pod for plug-in image registry.redhat.io/container-native-virtualization-cnv/must-gather-rhel8:v2.3.0-45 created [must-gather-7cct8] OUT gather did not start: unable to pull image: ImagePullBackOff: Back-off pulling image "registry.redhat.io/container-native-virtualization-cnv/must-gather-rhel8:v2.3.0-45" [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-qjz7b deleted [must-gather ] OUT namespace/openshift-must-gather-5c5sb deleted error: gather did not start for pod must-gather-7cct8: unable to pull image: ImagePullBackOff: Back-off pulling image "registry.redhat.io/container-native-virtualization-cnv/must-gather-rhel8:v2.3.0-45" @yuval, can you please let me know if any other validation is required here other than the above ? Thanks !!
Nice, you could verify with an InvalidImageName (capital letters for the image registry should do the trick). Other than this, I think you covered it, unless @Nelly wants to test anything else here. Thanks !
Thanks Yuval, tried again as suggested by you here and i see that it returns with ImagePullBackOff error. [ramakasturinarra@dhcp35-60 ~]$ oc adm must-gather --image=registry.redhat.IO/container-native-virtualization-cnv/must-gather-rhel8:v2.3.0-45 [must-gather ] OUT Using must-gather plugin-in image: registry.redhat.IO/container-native-virtualization-cnv/must-gather-rhel8:v2.3.0-45 [must-gather ] OUT namespace/openshift-must-gather-hcd66 created [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-98rln created [must-gather ] OUT pod for plug-in image registry.redhat.IO/container-native-virtualization-cnv/must-gather-rhel8:v2.3.0-45 created [must-gather-2lcqv] OUT gather did not start: unable to pull image: ImagePullBackOff: Back-off pulling image "registry.redhat.IO/container-native-virtualization-cnv/must-gather-rhel8:v2.3.0-45" [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-98rln deleted [must-gather ] OUT namespace/openshift-must-gather-hcd66 deleted error: gather did not start for pod must-gather-2lcqv: unable to pull image: ImagePullBackOff: Back-off pulling image "registry.redhat.IO/container-native-virtualization-cnv/must-gather-rhel8:v2.3.0-45" [ramakasturinarra@dhcp35-60 ~]$ oc adm must-gather --image=REGISTRY.REDHAT.IO/container-native-virtualization-cnv/must-gather-rhel8:v2.3.0-45 [must-gather ] OUT Using must-gather plugin-in image: REGISTRY.REDHAT.IO/container-native-virtualization-cnv/must-gather-rhel8:v2.3.0-45 [must-gather ] OUT namespace/openshift-must-gather-l4bnv created [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-pndm7 created [must-gather ] OUT pod for plug-in image REGISTRY.REDHAT.IO/container-native-virtualization-cnv/must-gather-rhel8:v2.3.0-45 created [must-gather-fcnlw] OUT gather did not start: unable to pull image: ImagePullBackOff: Back-off pulling image "REGISTRY.REDHAT.IO/container-native-virtualization-cnv/must-gather-rhel8:v2.3.0-45" [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-pndm7 deleted [must-gather ] OUT namespace/openshift-must-gather-l4bnv deleted error: gather did not start for pod must-gather-fcnlw: unable to pull image: ImagePullBackOff: Back-off pulling image "REGISTRY.REDHAT.IO/container-native-virtualization-cnv/must-gather-rhel8:v2.3.0-45" will wait for nelly before moving the bug to verified state.
I tried with something like ./oc adm must-gather --image=AAAAA
Trying the above gives me the below error. [ramakasturinarra@dhcp35-60 ~]$ oc adm must-gather --image=AAAAA [must-gather ] OUT Using must-gather plugin-in image: AAAAA [must-gather ] OUT namespace/openshift-must-gather-2qc9w created [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-vkzmv created [must-gather ] OUT unable to parse image reference AAAAA: repository name must be lowercase [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-vkzmv deleted [must-gather ] OUT namespace/openshift-must-gather-2qc9w deleted error: repository name must be lowercase
Based on above comments moving the bug to verified state. @Nelly please feel free to reopen if you find something obvious is missing incase you get a chance to test the same, thanks !!
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196