Bug 1534513
| Summary: | 'oc adm diagnostic NetworkCheck' cannot be working | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | zhaozhanqi <zzhao> |
| Component: | oc | Assignee: | Luke Meyer <lmeyer> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Xingxing Xia <xxia> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 3.9.0 | CC: | anli, aos-bugs, bbennett, jokerman, lmeyer, mmccomas, vlaad |
| Target Milestone: | --- | ||
| Target Release: | 3.9.0 | ||
| Hardware: | All | ||
| OS: | All | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: |
undefined
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-09-11 18:26:31 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
This is dup of https://github.com/openshift/origin/issues/18141 Luke has already proposed these fixes: https://github.com/openshift/origin/pull/18145 and https://github.com/openshift/origin/issues/18141 https://github.com/openshift/origin/pull/18186 is the current favored candidate fix. *** Bug 1537478 has been marked as a duplicate of this bug. *** the above issue has been fixed.
but when I verified this bug on oc v3.9.0-0.38.0 always met 'Failed to pull image "openshift3/ose-deployer:v3.9.0-0.38.0": rpc error: code = Unknown desc = repository docker.io/openshift3/ose-deployer not found: does not exist or no pull access'
#oc describe pod network-diag-test-pod-b4bfk -n network-diag-ns-2zf55
Name: network-diag-test-pod-b4bfk
Namespace: network-diag-ns-2zf55
Node: 172.16.120.64/172.16.120.64
Start Time: Tue, 06 Feb 2018 05:13:46 -0500
Labels: network-diag-pod-name=network-diag-test-pod-b4bfk
Annotations: openshift.io/scc=anyuid
Status: Pending
IP:
Containers:
network-diag-test-pod-b4bfk:
Container ID:
Image: openshift3/ose-deployer:v3.9.0-0.38.0
Image ID:
Port: <none>
Command:
socat
-T
1
-d
TCP-l:8080,reuseaddr,fork,crlf
system:"echo 'HTTP/1.0 200 OK'; echo 'Content-Type: text/plain'; echo; echo 'Hello OpenShift'"
State: Waiting
Reason: ErrImagePull
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-jbbbm (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
default-token-jbbbm:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-jbbbm
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulMountVolume 31s kubelet, 172.16.120.64 MountVolume.SetUp succeeded for volume "default-token-jbbbm"
Normal Pulling 29s kubelet, 172.16.120.64 pulling image "openshift3/ose-deployer:v3.9.0-0.38.0"
Warning Failed 27s kubelet, 172.16.120.64 Failed to pull image "openshift3/ose-deployer:v3.9.0-0.38.0": rpc error: code = Unknown desc = repository docker.io/openshift3/ose-deployer not found: does not exist or no pull access
Warning Failed 27s kubelet, 172.16.120.64 Error: ErrImagePull
Normal SandboxChanged 0s (x10 over 27s) kubelet, 172.16.120.64 Pod sandbox changed, it will be killed and re-created.
I would certainly not expect docker.io/openshift3/ose-deployer to exist (at any version). And of course this version is not yet shipped via registry.access.redhat.com either. To test this diagnostic with pre-GA OCP images you'll have to either: 1. configure docker to include a registry that does have exactly the right image requested, or 2. use the available flags on the NetworkCheck diagnostic to specify (probably including registry) an image that is available. Since the test pods aren't actually getting deployed I don't think you've yet verified that the diagnostic is able to invoke them successfully. (Worth noting re methods above -- there's some question at https://github.com/openshift/origin/pull/18260#issuecomment-360302197 about whether NetworkCheck should continue to omit the registry from the default image; if that is reverted then the second method would be required for testing, as with DiagnosticPod; hopefully they will become consistent one way or the other shortly.) There is actually a third method, I guess, which is to docker tag all the necessary images on all nodes before testing the diagnostic. And to be quite clear, all this is only necessary for using pre-GA or non-RH images. |
Description of problem: Failed to run `oc adm diagnostic NetworkCheck` with error: chroot: failed to run command 'openshift-diagnostics': No such file or directory Version-Release number of selected component (if applicable): oc version oc v3.9.0-0.19.0 kubernetes v1.9.0-beta1 features: Basic-Auth GSSAPI Kerberos SPNEGO How reproducible: always Steps to Reproduce: 1. Set up env with multitenant plugin 2. run `oc adm diagnostic NetworkCheck` 3. Actual results: # oc adm diagnostics NetworkCheck [Note] Determining if client configuration exists for client/cluster diagnostics Info: Successfully read a client config file at '/root/.kube/config' [Note] Running diagnostic: NetworkCheck Description: Create a pod on all schedulable nodes and run network diagnostics from the application standpoint Info: Output from the network diagnostic pod on node "172.16.120.88": chroot: failed to run command 'openshift-diagnostics': No such file or directory Info: Output from the network diagnostic pod on node "172.16.120.100": chroot: failed to run command 'openshift-diagnostics': No such file or directory [Note] Summary of diagnostics execution (version v3.9.0-0.19.0): [Note] Completed with no errors or warnings seen. Expected results: no this error Additional info: find there 2 pods cannot be running: #oc describe pod network-diag-pod-4vj7l -n network-diag-ns-jrhp9 Name: network-diag-pod-4vj7l Namespace: network-diag-ns-jrhp9 Node: 172.16.120.88/172.16.120.88 Start Time: Mon, 15 Jan 2018 04:50:14 -0500 Labels: <none> Annotations: openshift.io/scc=privileged Status: Failed IP: 172.16.120.88 Containers: network-diag-pod-4vj7l: Container ID: docker://6574a1027c4b773fbbe120b583ba9a5990dcaf75806d9a30ef0c53a770f2767d Image: openshift3/ose:v3.9.0-0.19.0 Image ID: docker-pullable://openshift3/ose@sha256:48ab445c678ee7a35ab9db61d3bd3dd015ac4de81f239ae4a45545b32e0d1f63 Port: <none> Command: /bin/bash -c Args: #!/bin/bash # # Based on containerized/non-containerized openshift install, # this script sets the environment so that docker, openshift, iptables, etc. # binaries are availble for network diagnostics. # set -o nounset set -o pipefail node_rootfs=/host cmd="openshift-diagnostics network-diagnostic-pod -l 1" # Origin image: openshift/node, OSE image: openshift3/node node_image_regex="^openshift.*/node" node_container_id="$(chroot "${node_rootfs}" docker ps --format='{{.Image}} {{.ID}}' | grep "${node_image_regex}" | cut -d' ' -f2)" if [[ -z "${node_container_id}" ]]; then # non-containerized openshift env chroot "${node_rootfs}" ${cmd} else # containerized env # On containerized install, docker on the host is used by node container, # For the privileged network diagnostics pod to use all the binaries on the node: # - Copy kubeconfig secret to node mount namespace # - Run openshift under the mount namespace of node node_docker_pid="$(chroot "${node_rootfs}" docker inspect --format='{{.State.Pid}}' "${node_container_id}")" kubeconfig="/etc/origin/node/kubeconfig" cp "${node_rootfs}/secrets/kubeconfig" "${node_rootfs}/${kubeconfig}" chroot "${node_rootfs}" nsenter -m -t "${node_docker_pid}" -- /bin/bash -c 'KUBECONFIG='"${kubeconfig} ${cmd}"'' fi State: Terminated Reason: Error Exit Code: 127 Started: Mon, 15 Jan 2018 04:50:16 -0500 Finished: Mon, 15 Jan 2018 04:50:16 -0500 Ready: False Restart Count: 0 Environment: KUBECONFIG: /secrets/kubeconfig Mounts: /host from host-root-dir (rw) /host/secrets from kconfig-secret (ro) /var/run/secrets/kubernetes.io/serviceaccount from default-token-5hrdw (ro) Conditions: Type Status Initialized True Ready False PodScheduled True Volumes: host-root-dir: Type: HostPath (bare host directory volume) Path: / HostPathType: kconfig-secret: Type: Secret (a volume populated by a Secret) SecretName: network-diag-secret Optional: false default-token-5hrdw: Type: Secret (a volume populated by a Secret) SecretName: default-token-5hrdw Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: <none> Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulMountVolume 19s kubelet, 172.16.120.88 MountVolume.SetUp succeeded for volume "host-root-dir" Normal SuccessfulMountVolume 19s kubelet, 172.16.120.88 MountVolume.SetUp succeeded for volume "default-token-5hrdw" Normal SuccessfulMountVolume 19s kubelet, 172.16.120.88 MountVolume.SetUp succeeded for volume "kconfig-secret" Normal Pulled 18s kubelet, 172.16.120.88 Container image "openshift3/ose:v3.9.0-0.19.0" already present on machine Normal Created 18s kubelet, 172.16.120.88 Created container Normal Started 17s kubelet, 172.16.120.88 Started container