Bug 1277383
Summary: | ovs-port wasn't deleted when openshift deleted pods | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Anping Li <anli> | ||||
Component: | Networking | Assignee: | Dan Williams <dcbw> | ||||
Status: | CLOSED ERRATA | QA Contact: | Meng Bo <bmeng> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 3.0.0 | CC: | anli, aos-bugs, bleanhar, dcbw, dmcphers, yadu | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | atomic-openshift-3.1.0.4.git.10.ec10652 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2016-01-26 19:16:42 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Anping Li
2015-11-03 08:34:47 UTC
[root@openshift-minion-1 vagrant]# cat /var/log/openvswitch/ovs-vswitchd.log|grep 'could not open network device' |wc [root@openshift-minion-1 vagrant]# ovs-vsctl list-ifaces br0tun0 vovsbr vxlan0 can you attach the openshift journal output? Created attachment 1090664 [details]
node journal logs
Nov 06 15:57:37 openshift-155.lab.eng.nay.redhat.com atomic-openshift-node[5186]: E1106 15:57:37.835561 5186 common.go:535] Error fetching Net ID for namespace: wewang2, skipped netNsEvent: &{ADDED wewang2 15} Nov 06 16:01:06 openshift-155.lab.eng.nay.redhat.com atomic-openshift-node[5186]: I1106 16:01:06.797368 5186 manager.go:1451] Container "95bc279983850eaf2b2a8af99850972974f6d24587ee6584ee5e37b461875e97 ruby-helloworld-database wewang2/database-1-lj377" exited after 1.148343426s Nov 06 16:01:06 openshift-155.lab.eng.nay.redhat.com atomic-openshift-node[5186]: E1106 16:01:06.797478 5186 manager.go:1342] Failed tearing down the infra container: Error fetching VNID for namespace: wewang2 Nov 06 16:01:06 openshift-155.lab.eng.nay.redhat.com atomic-openshift-node[5186]: I1106 16:01:06.797936 5186 manager.go:1451] Container "95bc279983850eaf2b2a8af99850972974f6d24587ee6584ee5e37b461875e97 /" exited after 1.132851637s Nov 06 16:01:06 openshift-155.lab.eng.nay.redhat.com atomic-openshift-node[5186]: W1106 16:01:06.797959 5186 manager.go:1457] No ref for pod '"95bc279983850eaf2b2a8af99850972974f6d24587ee6584ee5e37b461875e97 /"' Nov 06 16:01:06 openshift-155.lab.eng.nay.redhat.com atomic-openshift-node[5186]: E1106 16:01:06.797979 5186 manager.go:1342] Failed tearing down the infra container: Error fetching VNID for namespace: wewang2 ... Nov 06 16:01:05 openshift-155.lab.eng.nay.redhat.com atomic-openshift-node[5186]: E1106 16:01:05.520164 5186 event.go:198] Server rejected event '&api.Event{TypeMeta:unversioned.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:api.ObjectMeta{Name:"ruby-sample-build-1-build.14140e156d7b7f73", GenerateName:"", Namespace:"wewang2", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:unversioned.Time{Time:time.Time{sec:0, nsec:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*unversioned.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil)}, InvolvedObject:api.ObjectReference{Kind:"Pod", Namespace:"wewang2", Name:"ruby-sample-build-1-build", UID:"3d7510ce-845c-11e5-8113-fa163e141094", APIVersion:"v1", ResourceVersion:"7061", FieldPath:"spec.containers{docker-build}"}, Reason:"Killing", Message:"Killing with docker id b0e83f3067e3", Source:api.EventSource{Component:"kubelet", Host:"openshift-155.lab.eng.nay.redhat.com"}, FirstTimestamp:unversioned.Time{Time:time.Time{sec:63582393665, nsec:486684019, loc:(*time.Location)(0x4ad42e0)}}, LastTimestamp:unversioned.Time{Time:time.Time{sec:63582393665, nsec:486684019, loc:(*time.Location)(0x4ad42e0)}}, Count:1}': 'Event "ruby-sample-build-1-build.14140e156d7b7f73" is forbidden: Unable to create new content in namespace wewang2 because it is being terminated.' (will not retry!) ----------------- Perhaps we have a condition where the namespace got deleted, and so the nodes all got the VNID deletion event, and then *after* that they start deleting pods. But since the node no longer has the VNID in the cache, it fails the TearDownPod step. But we don't even need the VNID for pod teardown, since we use the pod IP address (and previously cookies). I think we should just remove the bits in TearDownPod that get the VNID and if we do need it in the future, figure out how to get it then. Upstream PR: https://github.com/openshift/openshift-sdn/pull/207 These PRs got merged, so the next openshift origin release should have the relevant fixes. The fix works well on origin openshift v1.1-315-gdc545a5-dirty. move bug to modified status and waiting OSE puddle. ovs-port was deleted once pod are deleted on origin 1) create some pods and check the ovs port node1: #ovs-vsctl list-ifaces br0 tun0 veth2795fbc veth9c32332 vethe58a5cb vetheef0077 vovsbr vxlan0 node2: #ovs-vsctl list-ifaces br0 tun0 veth617b9ac vethbc10972 vethe36dc1e vovsbr vxlan0 node3: #ovs-vsctl list-ifaces br0 tun0 veth66c76b9 veth7f2fe81 veth943d2d4 vovsbr vxlan0 2) delete pods and check ovs port again node1: #ovs-vsctl list-ifaces br0 tun0 vovsbr vxlan0 node2: #ovs-vsctl list-ifaces br0 tun0 vovsbr vxlan0 node3: #ovs-vsctl list-ifaces br0 tun0 vovsbr vxlan0 It looks like the fix is in atomic-openshift-3.1.0.4.git.10.ec10652 (2015-11-30) and possibly earlier versions. Do you need to test again with the OSE puddle versions or is this bug good to go? Verified and pass on atomic-openshift-3.1.0.902 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2016:0070 |