Description of problem: Failed to run `oc adm diagnostic NetworkCheck` with error: chroot: failed to run command 'openshift-diagnostics': No such file or directory Version-Release number of selected component (if applicable): oc version oc v3.9.0-0.19.0 kubernetes v1.9.0-beta1 features: Basic-Auth GSSAPI Kerberos SPNEGO How reproducible: always Steps to Reproduce: 1. Set up env with multitenant plugin 2. run `oc adm diagnostic NetworkCheck` 3. Actual results: # oc adm diagnostics NetworkCheck [Note] Determining if client configuration exists for client/cluster diagnostics Info: Successfully read a client config file at '/root/.kube/config' [Note] Running diagnostic: NetworkCheck Description: Create a pod on all schedulable nodes and run network diagnostics from the application standpoint Info: Output from the network diagnostic pod on node "172.16.120.88": chroot: failed to run command 'openshift-diagnostics': No such file or directory Info: Output from the network diagnostic pod on node "172.16.120.100": chroot: failed to run command 'openshift-diagnostics': No such file or directory [Note] Summary of diagnostics execution (version v3.9.0-0.19.0): [Note] Completed with no errors or warnings seen. Expected results: no this error Additional info: find there 2 pods cannot be running: #oc describe pod network-diag-pod-4vj7l -n network-diag-ns-jrhp9 Name: network-diag-pod-4vj7l Namespace: network-diag-ns-jrhp9 Node: 172.16.120.88/172.16.120.88 Start Time: Mon, 15 Jan 2018 04:50:14 -0500 Labels: <none> Annotations: openshift.io/scc=privileged Status: Failed IP: 172.16.120.88 Containers: network-diag-pod-4vj7l: Container ID: docker://6574a1027c4b773fbbe120b583ba9a5990dcaf75806d9a30ef0c53a770f2767d Image: openshift3/ose:v3.9.0-0.19.0 Image ID: docker-pullable://openshift3/ose@sha256:48ab445c678ee7a35ab9db61d3bd3dd015ac4de81f239ae4a45545b32e0d1f63 Port: <none> Command: /bin/bash -c Args: #!/bin/bash # # Based on containerized/non-containerized openshift install, # this script sets the environment so that docker, openshift, iptables, etc. # binaries are availble for network diagnostics. # set -o nounset set -o pipefail node_rootfs=/host cmd="openshift-diagnostics network-diagnostic-pod -l 1" # Origin image: openshift/node, OSE image: openshift3/node node_image_regex="^openshift.*/node" node_container_id="$(chroot "${node_rootfs}" docker ps --format='{{.Image}} {{.ID}}' | grep "${node_image_regex}" | cut -d' ' -f2)" if [[ -z "${node_container_id}" ]]; then # non-containerized openshift env chroot "${node_rootfs}" ${cmd} else # containerized env # On containerized install, docker on the host is used by node container, # For the privileged network diagnostics pod to use all the binaries on the node: # - Copy kubeconfig secret to node mount namespace # - Run openshift under the mount namespace of node node_docker_pid="$(chroot "${node_rootfs}" docker inspect --format='{{.State.Pid}}' "${node_container_id}")" kubeconfig="/etc/origin/node/kubeconfig" cp "${node_rootfs}/secrets/kubeconfig" "${node_rootfs}/${kubeconfig}" chroot "${node_rootfs}" nsenter -m -t "${node_docker_pid}" -- /bin/bash -c 'KUBECONFIG='"${kubeconfig} ${cmd}"'' fi State: Terminated Reason: Error Exit Code: 127 Started: Mon, 15 Jan 2018 04:50:16 -0500 Finished: Mon, 15 Jan 2018 04:50:16 -0500 Ready: False Restart Count: 0 Environment: KUBECONFIG: /secrets/kubeconfig Mounts: /host from host-root-dir (rw) /host/secrets from kconfig-secret (ro) /var/run/secrets/kubernetes.io/serviceaccount from default-token-5hrdw (ro) Conditions: Type Status Initialized True Ready False PodScheduled True Volumes: host-root-dir: Type: HostPath (bare host directory volume) Path: / HostPathType: kconfig-secret: Type: Secret (a volume populated by a Secret) SecretName: network-diag-secret Optional: false default-token-5hrdw: Type: Secret (a volume populated by a Secret) SecretName: default-token-5hrdw Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: <none> Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulMountVolume 19s kubelet, 172.16.120.88 MountVolume.SetUp succeeded for volume "host-root-dir" Normal SuccessfulMountVolume 19s kubelet, 172.16.120.88 MountVolume.SetUp succeeded for volume "default-token-5hrdw" Normal SuccessfulMountVolume 19s kubelet, 172.16.120.88 MountVolume.SetUp succeeded for volume "kconfig-secret" Normal Pulled 18s kubelet, 172.16.120.88 Container image "openshift3/ose:v3.9.0-0.19.0" already present on machine Normal Created 18s kubelet, 172.16.120.88 Created container Normal Started 17s kubelet, 172.16.120.88 Started container
This is dup of https://github.com/openshift/origin/issues/18141 Luke has already proposed these fixes: https://github.com/openshift/origin/pull/18145 and https://github.com/openshift/origin/issues/18141
https://github.com/openshift/origin/pull/18186 is the current favored candidate fix.
*** Bug 1537478 has been marked as a duplicate of this bug. ***
https://github.com/openshift/origin/pull/18186 merged
the above issue has been fixed. but when I verified this bug on oc v3.9.0-0.38.0 always met 'Failed to pull image "openshift3/ose-deployer:v3.9.0-0.38.0": rpc error: code = Unknown desc = repository docker.io/openshift3/ose-deployer not found: does not exist or no pull access' #oc describe pod network-diag-test-pod-b4bfk -n network-diag-ns-2zf55 Name: network-diag-test-pod-b4bfk Namespace: network-diag-ns-2zf55 Node: 172.16.120.64/172.16.120.64 Start Time: Tue, 06 Feb 2018 05:13:46 -0500 Labels: network-diag-pod-name=network-diag-test-pod-b4bfk Annotations: openshift.io/scc=anyuid Status: Pending IP: Containers: network-diag-test-pod-b4bfk: Container ID: Image: openshift3/ose-deployer:v3.9.0-0.38.0 Image ID: Port: <none> Command: socat -T 1 -d TCP-l:8080,reuseaddr,fork,crlf system:"echo 'HTTP/1.0 200 OK'; echo 'Content-Type: text/plain'; echo; echo 'Hello OpenShift'" State: Waiting Reason: ErrImagePull Ready: False Restart Count: 0 Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-jbbbm (ro) Conditions: Type Status Initialized True Ready False PodScheduled True Volumes: default-token-jbbbm: Type: Secret (a volume populated by a Secret) SecretName: default-token-jbbbm Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: <none> Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulMountVolume 31s kubelet, 172.16.120.64 MountVolume.SetUp succeeded for volume "default-token-jbbbm" Normal Pulling 29s kubelet, 172.16.120.64 pulling image "openshift3/ose-deployer:v3.9.0-0.38.0" Warning Failed 27s kubelet, 172.16.120.64 Failed to pull image "openshift3/ose-deployer:v3.9.0-0.38.0": rpc error: code = Unknown desc = repository docker.io/openshift3/ose-deployer not found: does not exist or no pull access Warning Failed 27s kubelet, 172.16.120.64 Error: ErrImagePull Normal SandboxChanged 0s (x10 over 27s) kubelet, 172.16.120.64 Pod sandbox changed, it will be killed and re-created.
I would certainly not expect docker.io/openshift3/ose-deployer to exist (at any version). And of course this version is not yet shipped via registry.access.redhat.com either. To test this diagnostic with pre-GA OCP images you'll have to either: 1. configure docker to include a registry that does have exactly the right image requested, or 2. use the available flags on the NetworkCheck diagnostic to specify (probably including registry) an image that is available. Since the test pods aren't actually getting deployed I don't think you've yet verified that the diagnostic is able to invoke them successfully. (Worth noting re methods above -- there's some question at https://github.com/openshift/origin/pull/18260#issuecomment-360302197 about whether NetworkCheck should continue to omit the registry from the default image; if that is reverted then the second method would be required for testing, as with DiagnosticPod; hopefully they will become consistent one way or the other shortly.) There is actually a third method, I guess, which is to docker tag all the necessary images on all nodes before testing the diagnostic. And to be quite clear, all this is only necessary for using pre-GA or non-RH images.
According to comment 7, if docker pull the correct image. it can work Verified this bug.