Description of problem: oc debug node/<node-name> does not start a remote shell. it hangs for a while before reporting an error. looking at the pod it created you can see it tries to pull an image that's not for s390x. Failed to pull image "registry.redhat.io/rhel7/support-tools": rpc error: code = Unknown desc = no image found in manifest list for architecture s390x, OS linux Version-Release number of selected component (if applicable): # oc version Client Version: openshift-clients-4.2.2-201910250432-12-g72076900 Server Version: 4.2.12-s390x Kubernetes Version: v1.14.6+32dc4a0
We will need to mention in the documentation that s390x users need to specify the RHEL 8 support-tools image. oc debug --image=registry.redhat.io/rhel8/support-tools
this works for me
Also reported in https://bugzilla.redhat.com/show_bug.cgi?id=1777030. Let's use this one though, since it has more information.
*** Bug 1777030 has been marked as a duplicate of this bug. ***
*** Bug 1777031 has been marked as a duplicate of this bug. ***
*** Bug 1778807 has been marked as a duplicate of this bug. ***
https://access.redhat.com/containers/#/registry.access.redhat.com/rhel7/support-tools now reports that it supports AMD64, ppc64le, and s390x.
I tested oc debug node/<node-name> on four different clusters. It worked on three out of the four clusters. I noticed that the clusters on which the command worked have version 4.2.16 and the one on which it didn't worked has version 4.2.12. I think different versions should not influence whether the command works or not, as the registry is external. Do you have any idea why the command works three out of four times?
(In reply to wvoesch from comment #10) > I tested oc debug node/<node-name> on four different clusters. It worked on > three out of the four clusters. > Not acceptable - please reopen the bug!
I tested again and now in general the command works on all four clusters. However this time I got once a different error: # oc debug node/<node-name> Starting pod/<node-name-debug> ... To use host binaries, run `chroot /host` Removing debug pod ... error: unable to create the debug pod "<node-name-debug> " I tried to start the debug container on another node and it worked so I retried it on the node where it didn't work first time. The second time it works: # oc debug node/<node-name> Starting pod/<node-name-debug> ... To use host binaries, run `chroot /host` Pod IP: <pod-ip> If you don't see a command prompt, try pressing enter. sh-4.2#
So it still doesn't work reliably. Can you reopen this bug? For the future: Can you automate this testcase - try all clusters that you have. For each cluster try 100 times.... We probably should keep this in the regression test bucket for a while....
So, the underlying Multi-Arch related issue here was that the container being pulled did not support non-x86_64 architectures. This has been fixed and verified. If you're now hitting an intermittent issue with oc debug node, that is a new issue altogether, since it is not related to the change of the underlying issue that this bug was tracking. If you can reproduce the intermittent issue on your clusters, I suggest you file a new bug with instructions on how to reproduce, so that we can narrow in on the new problem.
As discussed in the bugzapper call - we close this issue as it was resolved and will open a new one as soon we have re-tested if the reliability needs to be improved.