Description of problem: In a disconnected environment oc adm diagnostics gets stuck: $ oc adm diagnostics --images=registry.example.com:5000/openshift3/ose-deployer:v3.6.173.0.5 --network-pod-image=registry.example.com:5000/openshift3/ose-deployer:v3.6.173.0.5 Version-Release number of selected component (if applicable): ... [Note] Running diagnostic: NetworkCheck Description: Create a pod on all schedulable nodes and run network diagnostics from the application standpoint ^CERROR: [DNet2006 from diagnostic NetworkCheck@openshift/origin/pkg/diagnostics/network/run_pod.go:136] Creating network diagnostic pod "network-diag-pod-qth3j" on node "infra01.test.example.com" with command "openshift infra network-diagnostic-pod -l 1" failed: namespaces "network-diag-ns-nzdz1" not found Not sure whether this is related to this later warning: ERROR: [DClu1019 from diagnostic ClusterRegistry@openshift/origin/pkg/diagnostics/cluster/registry.go:343] Diagnostics created a test ImageStream and compared the registry IP it received to the registry IP available via the docker-registry service. docker-registry : 172.30.116.71:5000 ImageStream registry : docker-registry.default.svc:5000 They do not match, which probably means that an administrator re-created the docker-registry service but the master has cached the old service IP address. Builds or deployments that use ImageStreams with the wrong docker-registry IP will fail under this condition. To resolve this issue, restarting the master (to clear the cache) should be sufficient. Existing ImageStreams may need to be re-created. Version-Release number of selected component (if applicable): atomic-openshift-clients-3.6.173.0.5-1.git.0.f30b99e.el7.x86_64
Also $ oc adm diagnostics --images=registry.example.com:5000/openshift3/ose-deployer:v3.6.173.0.5 --network-pod-image=registry.example.com:5000/openshift3/ose-deployer:v3.6.173.0.5 --network-test-pod-image=registry.example.com:5000/openshift3/ose-deployer:v3.6.173.0.5 is failing: [Note] Running diagnostic: NetworkCheck Description: Create a pod on all schedulable nodes and run network diagnostics from the application standpoint ERROR: [DNet2005 from diagnostic NetworkCheck@openshift/origin/pkg/diagnostics/network/run_pod.go:119] Setting up test environment for network diagnostics failed: Failed to run network diags test pod and service: Failed to run network diags test pods, failed: 40, total: 40 I've reported the registry issue separately at https://bugzilla.redhat.com/show_bug.cgi?id=1488059 so we can keep this BZ only about the network test. Thanks.
I also seem to run in to this issue on two separate clusters. The node logs will contain entries like these and if you snatch the logs out of a pod (by for example suspending the diagnostics command so you have time to grab it), the log will only contain error: --deployment or OPENSHIFT_DEPLOYMENT_NAME is required : Sep 01 13:59:26 ocp-4 oci-register-machine[38529]: 2017/09/01 13:59:26 Register machine: prestart d8d507821244185b55f3486316140f2748aa211b5c47396f67e715b49fbdaf7b 385 15 /var/lib/docker/devicemapper/mnt/f6fbd5d819bdf917d23a1b90b6096262670219414f59a35b7cba1d65e038c828/rootfs Sep 01 13:59:26 ocp-4 systemd-machined[2963]: New machine d8d507821244185b55f3486316140f27. Sep 01 13:59:26 ocp-4 oci-systemd-hook[38536]: systemdhook <info>: gidMappings not found in config Sep 01 13:59:26 ocp-4 oci-systemd-hook[38536]: systemdhook <debug>: GID: 0 Sep 01 13:59:26 ocp-4 oci-systemd-hook[38536]: systemdhook <info>: uidMappings not found in config Sep 01 13:59:26 ocp-4 oci-systemd-hook[38536]: systemdhook <debug>: UID: 0 Sep 01 13:59:26 ocp-4 oci-systemd-hook[38536]: systemdhook <debug>: Skipping as container command is /usr/bin/openshift-deploy, not init or systemd ... Sep 01 13:59:27 ocp-4 kernel: XFS (dm-19): Starting recovery (logdev: internal) Sep 01 13:59:27 ocp-4 dockerd-current[1317]: error: --deployment or OPENSHIFT_DEPLOYMENT_NAME is required Sep 01 13:59:27 ocp-4 systemd[1]: Stopped docker container 99ecb2540ef8246bb60f4b85b486ab5110c91bf39e365719efaae13ef73d984d.
I can replicate this issue as well, on a fresh install of a connected OCP 3.6 (OpenShift Master: v3.6.173.0.5 Kubernetes Master: v1.6.1+5115d708d7) [root@ocpm-0 ~]# oc adm diagnostics [Note] Determining if client configuration exists for client/cluster diagnostics Info: Successfully read a client config file at '/root/.kube/config' Info: Using context for cluster-admin access: 'appx-prod/rhf17ocpmaster-northeurope-cloudapp-azure-com:8443/system:admin' [Note] Performing systemd discovery [Note] Running diagnostic: ConfigContexts[appx-test/rhf17ocpmaster-northeurope-cloudapp-azure-com:8443/system:admin] Description: Validate client config context is complete and has connectivity Info: For client config context 'appx-test/rhf17ocpmaster-northeurope-cloudapp-azure-com:8443/system:admin': The server URL is 'https://rhf17ocpmaster.northeurope.cloudapp.azure.com:8443' The user authentication is 'system:admin/rhf17ocpmaster-northeurope-cloudapp-azure-com:8443' The current project is 'appx-test' Successfully requested project list; has access to project(s): [appx-dev appx-prod appx-test default demo-hpa kube-public kube-system logging management-infra openshift ...] [Note] Running diagnostic: DiagnosticPod Description: Create a pod to run diagnostics from the application standpoint Info: Output from the diagnostic pod (image registry.access.redhat.com/openshift3/ose-deployer:v3.6.173.0.5): [Note] Running diagnostic: PodCheckAuth Description: Check that service account credentials authenticate as expected Info: Service account token successfully authenticated to master Info: Service account token was authenticated by the integrated registry. [Note] Running diagnostic: PodCheckDns Description: Check that DNS within a pod works as expected [Note] Summary of diagnostics execution (version v3.6.173.0.5): [Note] Completed with no errors or warnings seen. [Note] Running diagnostic: NetworkCheck Description: Create a pod on all schedulable nodes and run network diagnostics from the application standpoint ERROR: [DNet2005 from diagnostic NetworkCheck@openshift/origin/pkg/diagnostics/network/run_pod.go:119] Setting up test environment for network diagnostics failed: Failed to run network diags test pod and service: Failed to run network diags test pods, failed: 14, total: 16
*** Bug 1476232 has been marked as a duplicate of this bug. ***
Fixed in https://github.com/openshift/origin/pull/16439
Commits pushed to master at https://github.com/openshift/origin https://github.com/openshift/origin/commit/85aa3968871269b21c823edef55233bac9adbd01 Bug 1481147 - Fix default pod image for network diagnostics - This also ensures network diagnostics pod and test pod images uses deployed openshift version/tag (not the latest) so that it doesn't need to download another same image with latest tag. https://github.com/openshift/origin/commit/fc7190d95e791408538abef8f779ba8493bec867 Merge pull request #16439 from pravisankar/netdiags-image-check Automatic merge from submit-queue Bug 1481147 - Fix default pod image for network diagnostics - This also ensures network diagnostics pod and test pod images uses deployed openshift version/tag (not the latest) so that it doesn't need to download another same image with latest tag. - Print more details when network diagnostics test setup fails. Currently when network diags fails, it only informs how many test pods failed but doesn't provide why those pods failed. This change will fetch logs for the pods in case of setup failure.
Hello, when the fix will be released in enterprise edition? In which version it should be included? Thank you
Should be released with OCP 3.7 at GA. Not sure if something needs to be done to get this bug attached to the errata, but QE should have a build to test so moving to ON_QA.
Verified this bug on v3.7.0-0.143.2 # oadm diagnostics NetworkCheck --network-pod-image='registry.ops.openshift.com/openshift3/ose:v3.7.0-0.143.2' --network-test-pod-image='registry.ops.openshift.com/openshift3/ose-deployer:v3.7.0-0.143.2' [Note] Determining if client configuration exists for client/cluster diagnostics Info: Successfully read a client config file at '/root/.kube/config' [Note] Running diagnostic: NetworkCheck Description: Create a pod on all schedulable nodes and run network diagnostics from the application standpoint Info: Output from the network diagnostic pod on node "ip-172-18-3-195.ec2.internal": [Note] Running diagnostic: CheckExternalNetwork Description: Check that external network is accessible within a pod [Note] Running diagnostic: CheckNodeNetwork Description: Check that pods in the cluster can access its own node. [Note] Running diagnostic: CheckPodNetwork Description: Check pod to pod communication in the cluster. In case of ovs-subnet network plugin, all pods should be able to communicate with each other and in case of multitenant network plugin, pods in non-global projects should be isolated and pods in global projects should be able to access any pod in the cluster and vice versa. [Note] Running diagnostic: CheckServiceNetwork Description: Check pod to service communication in the cluster. In case of ovs-subnet network plugin, all pods should be able to communicate with all services and in case of multitenant network plugin, services in non-global projects should be isolated and pods in global projects should be able to access any service in the cluster. [Note] Running diagnostic: CollectNetworkInfo Description: Collect network information in the cluster. [Note] Summary of diagnostics execution (version v3.7.0-0.143.2): [Note] Completed with no errors or warnings seen. Info: Output from the network diagnostic pod on node "ip-172-18-2-33.ec2.internal": [Note] Running diagnostic: CheckExternalNetwork Description: Check that external network is accessible within a pod [Note] Running diagnostic: CheckNodeNetwork Description: Check that pods in the cluster can access its own node. [Note] Running diagnostic: CheckPodNetwork Description: Check pod to pod communication in the cluster. In case of ovs-subnet network plugin, all pods should be able to communicate with each other and in case of multitenant network plugin, pods in non-global projects should be isolated and pods in global projects should be able to access any pod in the cluster and vice versa. [Note] Running diagnostic: CheckServiceNetwork Description: Check pod to service communication in the cluster. In case of ovs-subnet network plugin, all pods should be able to communicate with all services and in case of multitenant network plugin, services in non-global projects should be isolated and pods in global projects should be able to access any service in the cluster. [Note] Running diagnostic: CollectNetworkInfo Description: Collect network information in the cluster. [Note] Summary of diagnostics execution (version v3.7.0-0.143.2): [Note] Completed with no errors or warnings seen.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188