Bug 1388026 - [networking_public_59]Should delete the test namespace even if the test pod was created failed when running 'oadm diagnostics NetworkCheck'
Summary: [networking_public_59]Should delete the test namespace even if the test pod w...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OKD
Classification: Red Hat
Component: Networking
Version: 3.x
Hardware: All
OS: All
medium
medium
Target Milestone: ---
: ---
Assignee: Ravi Sankar
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-24 09:11 UTC by zhaozhanqi
Modified: 2016-12-09 21:52 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-12-09 21:52:55 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Origin (Github) 11719 0 None None None 2016-11-03 14:46:35 UTC
Origin (Github) 11818 0 None None None 2016-11-08 13:38:39 UTC

Description zhaozhanqi 2016-10-24 09:11:09 UTC
Description of problem:
When the test pods created failed in the network-diag-ns-xxxx namespaces. oadm diagnostics NetworkCheck should delete them in the end. otherwise. it will affect the next time execution.

Version-Release number of selected component (if applicable):
 #openshift version
openshift v1.4.0-alpha.0+0787d9f-738
kubernetes v1.4.0+776c994
etcd 3.1.0-alpha.1


How reproducible:
already

Steps to Reproduce:
1. Set up openshift cluster with 2 nodes
2. make one node pull the test image failed
3. run 'oadm diagnostics NetworkCheck
4. Check the test pod in the test namespace

Actual results:

# oadm diagnostics NetworkCheck 
[Note] Determining if client configuration exists for client/cluster diagnostics
Info:  Successfully read a client config file at '/root/.kube/config'

[Note] Running diagnostic: NetworkCheck
       Description: Create a pod on all schedulable nodes and run network diagnostics from the application standpoint
       
ERROR: [DNet2005 from diagnostic NetworkCheck@openshift/origin/pkg/diagnostics/network/run_pod.go:101]
       Setting up test environment for network diagnostics failed: Failed to run network diags test pod and service:
timed out waiting for the condition
       
[Note] Summary of diagnostics execution (version v1.4.0-alpha.0+0787d9f-738):
[Note] Errors seen: 1

step 4:
oc get pod --all-namespaces -o wide
NAMESPACE               NAME                           READY     STATUS              RESTARTS   AGE       IP              NODE
default                 docker-registry-2-elbmv        1/1       Running             0          1h        10.1.2.7        ip-172-18-15-181.ec2.internal
default                 docker-registry-2-wsb31        1/1       Running             1          2h        10.1.2.3        ip-172-18-15-181.ec2.internal
default                 registry-console-1-nhlv1       1/1       Running             0          1h        10.1.2.8        ip-172-18-15-181.ec2.internal
default                 router-1-8gwi3                 1/1       Running             1          2h        172.18.15.181   ip-172-18-15-181.ec2.internal
default                 router-1-apwkm                 0/1       Pending             0          32m       <none>          
install-test            dancer-mysql-example-1-build   0/1       Completed           0          2h        10.1.2.2        ip-172-18-15-181.ec2.internal
install-test            dancer-mysql-example-1-sw6og   1/1       Running             6          2h        10.1.2.2        ip-172-18-15-181.ec2.internal
install-test            database-1-l2hvc               1/1       Running             0          1h        10.1.2.9        ip-172-18-15-181.ec2.internal
network-diag-ns-1jf8t   network-diag-test-pod-70os8    0/1       ContainerCreating   0          12m       <none>          ip-172-18-13-225.ec2.internal
network-diag-ns-1jf8t   network-diag-test-pod-mga0r    0/1       ContainerCreating   0          12m       <none>          ip-172-18-13-225.ec2.internal
network-diag-ns-1jf8t   network-diag-test-pod-r30nm    0/1       OutOfpods           0          12m       <none>          ip-172-18-15-181.ec2.internal
network-diag-ns-1jf8t   network-diag-test-pod-ryi4a    0/1       ContainerCreating   0          12m       <none>          ip-172-18-13-225.ec2.internal
network-diag-ns-1jf8t   network-diag-test-pod-tkxum    0/1       OutOfpods           0          12m       <none>          ip-172-18-15-181.ec2.internal
Expected results:
1. should ignore the not ready node and give some meaningful messages like 'node
ip-172-18-15-180.ec2.internal is not ready'
Additional info:

Comment 1 Ravi Sankar 2016-11-02 02:26:20 UTC
github pr: https://github.com/openshift/origin/pull/11719

Comment 2 Ravi Sankar 2016-11-02 02:43:06 UTC
@zhaozhanqi 
'make one node pull the test image failed' => did you remove existing hello-openshift image and disabled internet access to docker containers by setting sysctl net.bridge.bridge-nf-call-iptables/net.ipv4.ip_forward=0?

Comment 3 zhaozhanqi 2016-11-02 05:54:26 UTC
@ravi

I made a wrong registry to pull the hello-openshift image.

Comment 4 openshift-github-bot 2016-11-04 05:31:39 UTC
Commit pushed to master at https://github.com/openshift/origin

https://github.com/openshift/origin/commit/78378e8969467d7c2a51033a60423b7a43bdac18
Bug 1388026 - Ensure deletion of namespaces created by network diagnostics command

Comment 5 zhaozhanqi 2016-11-08 02:11:46 UTC
Tested this issue on: 
openshift version
openshift v3.4.0.23+24b1a58
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0


I still can reproduced this issue with my local env 

I'm not sure maybe due to the cpu and memory is not enough. the pod cannot be running.

# oadm diagnostics NetworkCheck
[Note] Determining if client configuration exists for client/cluster diagnostics
Info:  Successfully read a client config file at '/root/.kube/config'

[Note] Running diagnostic: NetworkCheck
       Description: Create a pod on all schedulable nodes and run network diagnostics from the application standpoint
       
ERROR: [DNet2005 from diagnostic NetworkCheck@openshift/origin/pkg/diagnostics/network/run_pod.go:114]
       Setting up test environment for network diagnostics failed: Failed to run network diags test pod and service:
[timed out waiting for the condition, timed out waiting for the condition]
       
[Note] Summary of diagnostics execution (version v3.4.0.23+24b1a58):
[Note] Errors seen: 1
[root@minion1 ~]# oc get pod --all-namespaces
NAMESPACE                      NAME                          READY     STATUS              RESTARTS   AGE
default                        caddy-docker                  1/1       Running             1          4d
default                        router-9-70hqt                1/1       Running             0          3d
default                        test-rc-jzgcp                 1/1       Running             0          3d
default                        test-rc-wnf9x                 1/1       Running             0          3d
network-diag-global-ns-dddni   network-diag-test-pod-l9xm8   0/1       OutOfpods           0          2m
network-diag-global-ns-dddni   network-diag-test-pod-pwal3   0/1       OutOfpods           0          2m
network-diag-global-ns-dddni   network-diag-test-pod-ufdwz   0/1       OutOfpods           0          2m
network-diag-global-ns-rjx7c   network-diag-test-pod-49win   0/1       OutOfpods           0          2m
network-diag-global-ns-rjx7c   network-diag-test-pod-adtmm   0/1       OutOfpods           0          2m
network-diag-global-ns-rjx7c   network-diag-test-pod-tzejk   0/1       OutOfpods           0          2m
network-diag-ns-t7mnc          network-diag-test-pod-eshrk   0/1       OutOfpods           0          2m
network-diag-ns-t7mnc          network-diag-test-pod-w5id3   1/1       Running             0          2m
network-diag-ns-t7mnc          network-diag-test-pod-wuiuw   0/1       OutOfpods           0          2m
network-diag-ns-vtdqb          network-diag-test-pod-c3sbw   0/1       ContainerCreating   0          2m
network-diag-ns-vtdqb          network-diag-test-pod-iv4fl   1/1       Running             0          2m
network-diag-ns-vtdqb          network-diag-test-pod-ksy20   0/1       ContainerCreating   0          2m

Comment 6 Ravi Sankar 2016-11-08 03:10:03 UTC
Fixed in https://github.com/openshift/origin/pull/11818

Comment 7 openshift-github-bot 2016-11-09 01:22:37 UTC
Commit pushed to master at https://github.com/openshift/origin

https://github.com/openshift/origin/commit/9566c24cc21b5480b98822fbed67c762fbae1923
Bug 1388026 - Fix network diagnostics cleanup when test setup fails

Diagnostics cleanup is called after completion of the network diagnostics tests
or anything goes wrong after the setup (launching test pods/services) is done.
But setup itself can fail if it is unable to deploy pods or services on the nodes.
So make cleanup to be called even if setup fails.

Comment 8 zhaozhanqi 2016-11-14 08:47:46 UTC
This issue should be fixed in 
openshift version
openshift v3.4.0.25+1f36858
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0


Could you help change status to 'ON_QA' and I will verify this bug.

Comment 9 zhaozhanqi 2016-11-15 01:20:39 UTC
Verified this bug according to comment 8


Note You need to log in before you can comment on or make changes to this bug.