Bug 1421643
Summary: | 'oadm diagnostics NetworkCheck' timeout due to image 'openshift/diagnostics-deployer' pull failed | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | zhaozhanqi <zzhao> | |
Component: | Networking | Assignee: | Ravi Sankar <rpenta> | |
Status: | CLOSED ERRATA | QA Contact: | Meng Bo <bmeng> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 3.5.0 | CC: | aos-bugs, bbennett, eparis, jkaur, mluther, myllynen, rpenta, smunilla | |
Target Milestone: | --- | Keywords: | Reopened | |
Target Release: | --- | |||
Hardware: | All | |||
OS: | All | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1481550 1481551 (view as bug list) | Environment: | ||
Last Closed: | 2017-08-10 05:17:28 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1481550, 1481551, 1505898 |
Description
zhaozhanqi
2017-02-13 10:41:22 UTC
The image isn't available yet. But for OpenShift Container Platform, it needs to be openshift3/ose-diagnostics-deployer ravi, because of the 75 bazillion flakes we need this fixed in origin/master and origin/release-1.5. Can you do that backport? PR on origin/release-1.5 : https://github.com/openshift/origin/pull/13062 Commit pushed to master at https://github.com/openshift/origin https://github.com/openshift/origin/commit/b640a0e55fdebb6e2b3eb52fc404ecfee16ead98 Bug 1421643 - Use existing openshift/origin image instead of new openshift/diagnostics-deployer Any new image like 'openshift/diagnostics-deployer' incurs build/lifecycle costs to maintian and diagnostics-deployer image has only small block of shell code. To alleviate this problem, now the script is embedded into the pod definition and openshift/origin is used as diagnostics deployer image. On dev machines, currently openshift/origin is close to 800MB but we expect the size to be under 200MB when it is released (compressed, debug headers removed). Seems it still cannot work well on atomic host env # cat /etc/redhat-release Red Hat Enterprise Linux Atomic Host release 7.3 # oadm diagnostics NetworkCheck [Note] Determining if client configuration exists for client/cluster diagnostics Info: Successfully read a client config file at '/root/.kube/config' [Note] Running diagnostic: NetworkCheck Description: Create a pod on all schedulable nodes and run network diagnostics from the application standpoint Info: Output from the network diagnostic pod on node "host-8-175-189.host.centralci.eng.rdu2.redhat.com": Info: Output from the network diagnostic pod on node "host-8-174-76.host.centralci.eng.rdu2.redhat.com": [Note] Summary of diagnostics execution (version v3.5.0.35): [Note] Completed with no errors or warnings seen. *************************** check the logs find there some pod still cannot be running. Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 8s 8s 1 {kubelet host-8-175-189.host.centralci.eng.rdu2.redhat.com} spec.containers{network-diag-pod-kbst5} Normal Pulled Container image "openshift3/ose" already present on machine 8s 8s 1 {kubelet host-8-175-189.host.centralci.eng.rdu2.redhat.com} spec.containers{network-diag-pod-kbst5} Normal Created Created container with docker id 528da7892463; Security:[seccomp=unconfined] 7s 7s 1 {kubelet host-8-175-189.host.centralci.eng.rdu2.redhat.com} spec.containers{network-diag-pod-kbst5} Warning Failed Failed to start container with docker id 528da7892463 with error: Error response from daemon: {"message":"invalid header field value \"oci runtime error: container_linux.go:247: starting container process caused \\\"process_linux.go:359: container init caused \\\\\\\"rootfs_linux.go:54: mounting \\\\\\\\\\\\\\\"/var/lib/origin/openshift.local.volumes/pods/af3c94ac-fe3d-11e6-a07f-fa163ebcec06/volumes/kubernetes.io~secret/kconfig-secret\\\\\\\\\\\\\\\" to rootfs \\\\\\\\\\\\\\\"/var/lib/docker/devicemapper/mnt/7bbe3c708f48e9867c0a16f0e5fd162337c88284e3972b6e971d3dcc7abb6b5c/rootfs\\\\\\\\\\\\\\\" at \\\\\\\\\\\\\\\"/var/lib/docker/devicemapper/mnt/7bbe3c708f48e9867c0a16f0e5fd162337c88284e3972b6e971d3dcc7abb6b5c/rootfs/host/secrets\\\\\\\\\\\\\\\" caused \\\\\\\\\\\\\\\"mkdir /var/lib/docker/devicemapper/mnt/7bbe3c708f48e9867c0a16f0e5fd162337c88284e3972b6e971d3dcc7abb6b5c/rootfs/host/secrets: permission denied\\\\\\\\\\\\\\\"\\\\\\\"\\\"\\n\""} 7s 7s 1 {kubelet host-8-175-189.host.centralci.eng.rdu2.redhat.com} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "network-diag-pod-kbst5" with RunContainerError: "runContainer: Error response from daemon: {\"message\":\"invalid header field value \\\"oci runtime error: container_linux.go:247: starting container process caused \\\\\\\"process_linux.go:359: container init caused \\\\\\\\\\\\\\\"rootfs_linux.go:54: mounting \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"/var/lib/origin/openshift.local.volumes/pods/af3c94ac-fe3d-11e6-a07f-fa163ebcec06/volumes/kubernetes.io~secret/kconfig-secret\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\" to rootfs \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"/var/lib/docker/devicemapper/mnt/7bbe3c708f48e9867c0a16f0e5fd162337c88284e3972b6e971d3dcc7abb6b5c/rootfs\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\" at \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"/var/lib/docker/devicemapper/mnt/7bbe3c708f48e9867c0a16f0e5fd162337c88284e3972b6e971d3dcc7abb6b5c/rootfs/host/secrets\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\" caused \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"mkdir /var/lib/docker/devicemapper/mnt/7bbe3c708f48e9867c0a16f0e5fd162337c88284e3972b6e971d3dcc7abb6b5c/rootfs/host/secrets: permission denied\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"\\\\\\\\\\\\\\\"\\\\\\\"\\\\n\\\"\"}" Ravi is working on this to make the code more robust when only some nodes manage to pull the image. Commit pushed to master at https://github.com/openshift/origin https://github.com/openshift/origin/commit/4f5f8a6ae45b19e7c4c00cee2025bcf329d77b34 Bug 1421643 - Fix network diagnostics timeouts waitForNetworkPod() is called from few places and it has a fixed timeout of 82 seconds which was insufficient in few cases where the network bandwidth is low or network latency is high. This change will make waitForNetworkPod() to take custom timeout value based on the operation performed. Checked this issue on openshift v3.6.94 in atomic host env. still met the error when describe pod `oc describe pod network-diag-pod-rwr53 -n network-diag-ns-w06vl` Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 27s 27s 1 kubelet, ip-172-18-7-114.ec2.internal spec.containers{network-diag-pod-rwr53} Normal Pulled Container image "registry.access.redhat.com/openshift3/ose" already present on machine 27s 27s 1 kubelet, ip-172-18-7-114.ec2.internal spec.containers{network-diag-pod-rwr53} Normal Created Created container with id 74442f6f035c0a61258c771e21b014669269c2bda0a83079e33afbfbee610431 26s 26s 1 kubelet, ip-172-18-7-114.ec2.internal spec.containers{network-diag-pod-rwr53} Warning Failed Failed to start container with id 74442f6f035c0a61258c771e21b014669269c2bda0a83079e33afbfbee610431 with error: rpc error: code = 2 desc = failed to start container "74442f6f035c0a61258c771e21b014669269c2bda0a83079e33afbfbee610431": Error response from daemon: {"message":"invalid header field value \"oci runtime error: container_linux.go:247: starting container process caused \\\"process_linux.go:359: container init caused \\\\\\\"rootfs_linux.go:54: mounting \\\\\\\\\\\\\\\"/var/lib/origin/openshift.local.volumes/pods/9eb87cf1-49b9-11e7-a4f6-0e2345259696/volumes/kubernetes.io~secret/kconfig-secret\\\\\\\\\\\\\\\" to rootfs \\\\\\\\\\\\\\\"/var/lib/docker/devicemapper/mnt/3478616fca5a482f2c2a809dca2309074e025758abdf7e365faad7d2ff6eaff4/rootfs\\\\\\\\\\\\\\\" at \\\\\\\\\\\\\\\"/var/lib/docker/devicemapper/mnt/3478616fca5a482f2c2a809dca2309074e025758abdf7e365faad7d2ff6eaff4/rootfs/host/secrets\\\\\\\\\\\\\\\" caused \\\\\\\\\\\\\\\"mkdir /var/lib/docker/devicemapper/mnt/3478616fca5a482f2c2a809dca2309074e025758abdf7e365faad7d2ff6eaff4/rootfs/host/secrets: permission denied\\\\\\\\\\\\\\\"\\\\\\\"\\\"\\n\""} 26s 26s 1 kubelet, ip-172-18-7-114.ec2.internal Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "network-diag-pod-rwr53" with rpc error: code = 2 desc = failed to start container "74442f6f035c0a61258c771e21b014669269c2bda0a83079e33afbfbee610431": Error response from daemon: {"message":"invalid header field value \"oci runtime error: container_linux.go:247: starting container process caused \\\"process_linux.go:359: container init caused \\\\\\\"rootfs_linux.go:54: mounting \\\\\\\\\\\\\\\"/var/lib/origin/openshift.local.volumes/pods/9eb87cf1-49b9-11e7-a4f6-0e2345259696/volumes/kubernetes.io~secret/kconfig-secret\\\\\\\\\\\\\\\" to rootfs \\\\\\\\\\\\\\\"/var/lib/docker/devicemapper/mnt/3478616fca5a482f2c2a809dca2309074e025758abdf7e365faad7d2ff6eaff4/rootfs\\\\\\\\\\\\\\\" at \\\\\\\\\\\\\\\"/var/lib/docker/devicemapper/mnt/3478616fca5a482f2c2a809dca2309074e025758abdf7e365faad7d2ff6eaff4/rootfs/host/secrets\\\\\\\\\\\\\\\" caused \\\\\\\\\\\\\\\"mkdir /var/lib/docker/devicemapper/mnt/3478616fca5a482f2c2a809dca2309074e025758abdf7e365faad7d2ff6eaff4/rootfs/host/secrets: permission denied\\\\\\\\\\\\\\\"\\\\\\\"\\\"\\n\""}: "Start Container Failed" *** Bug 1439142 has been marked as a duplicate of this bug. *** Original timeout issue is fixed, current issue in comment# 10 is same as https://bugzilla.redhat.com/show_bug.cgi?id=1393716 (comment# 17). Closing this bug as the current issue has a open bug. *** This bug has been marked as a duplicate of bug 1393716 *** As comment 12 said, the original timeout issue is fixed, So in fact this is not a duplicated bug. marked this as 'verified'. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1716 |