Bug 1389212
| Summary: | all pod veth are removed after restarting openvswitch | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Hongan Li <hongli> |
| Component: | Networking | Assignee: | Dan Winship <danw> |
| Status: | CLOSED ERRATA | QA Contact: | Meng Bo <bmeng> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 3.4.0 | CC: | anli, aos-bugs, bbennett, hongli, tdawson |
| Target Milestone: | --- | Keywords: | Regression |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: |
undefined
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-01-18 12:46:58 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
the issue still can be reproduced in OCP 3.4.0.17 and OVS 2.5.0 openshift v3.4.0.17+b8a03bc kubernetes v1.4.0+776c994 ovs-vsctl (Open vSwitch) 2.5.0 Hm... worksforme [root@openshift-node-2 /]# ovs-ofctl -O OpenFlow13 show br0 | grep veth 3(veth642b9ac1): addr:fa:9f:07:8b:a8:02 [root@openshift-node-2 /]# systemctl restart openvswitch [root@openshift-node-2 /]# ovs-ofctl -O OpenFlow13 show br0 | grep veth 3(veth642b9ac1): addr:fa:9f:07:8b:a8:02 "ps" shows that ovsdb-server and ovs-vswitchd are being restarted still can be reproduced it in OCP 3.4.0.18. And notice that the node status stays in "NotReady" for about 10 minutes after restarting openvswitch. No veth shows during this time. Seems it is related to this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1388867 (In reply to hongli from comment #3) > Seems it is related to this bug: > https://bugzilla.redhat.com/show_bug.cgi?id=1388867 OK, let's wait for that to get fixed and see if this is still reproducible For upgrade, i hit same issue, the step is as following. 1. install OCP 3.3 and create pod and etc 2. upgrade to OCP 3.4 and openvswitch 2.4 3. upgrade openvswitch to 2.5 and restart openvswitch. 4. check the existing pods. some pods cannot be access. (In reply to Dan Winship from comment #5) > (In reply to hongli from comment #3) > > Seems it is related to this bug: > > https://bugzilla.redhat.com/show_bug.cgi?id=1388867 > > OK, let's wait for that to get fixed and see if this is still reproducible That bug is fixed in the current packages. Is this bug still reproducible *without* upgrading OVS in the middle? The bug still can be reproduced in OCP 3.4.0.23 *with* or *without* upgrading OVS. But it it not 100% reproducible now and I didn't find any clue. Seem need to restart openvswitch more times to reproduce the issue. I reserved the two env for your debugging. Ah, it's not actually especially OVS-related at all; it's just a badly-timed crash at startup. (OpenShift gets restarted when OVS gets restarted (which is correct), sees that it needs to recreate the SDN, and starts doing so. But when it gets to the part where it starts reattaching the pods, it hits a bug and panics. Systemd then restarts it, and when it gets started this time, it sees that the OVS bridge already exists and so skips SDN setup. So as a result the old pods never get reattached.) This should be tested on OSE v3.4.0.25 or newer. verified in v3.4.0.25 and bug has been fixed. Verified and fixed in v3.4.0.25 containerized env. But the issue still can be reproduced in rpm installation env v3.4.0.26. So assign the bug back. Leave the env for debugging. verified in 3.4.0.27 rpm installation env, cannot reproduce the issue. openshift v3.4.0.27+dffa95c kubernetes v1.4.0+776c994 etcd 3.1.0-rc.0 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0066 |
Description of problem: the pods veth are removed after restarting openvswitch Version-Release number of selected component (if applicable): openshift v3.4.0.16+cc70b72 kubernetes v1.4.0+776c994 etcd 3.1.0-rc.0 ovs-vsctl (Open vSwitch) 2.4.0 How reproducible: always Steps to Reproduce: 1. Setup multi-node env with multitenant network plugin 2. create some pods on the node 3. SSH to node and check pod veth 4. systemctl restart openvswitch Actual results: pod veth removed after openvswitch restarted and cannot reach to pods on the node. Details as below: ====== before restarting openswitch: [root@ip-172-18-4-193 ~]# ovs-ofctl -O OpenFlow13 show br0 OFPT_FEATURES_REPLY (OF1.3) (xid=0x2): dpid:000012f02b5a5444 n_tables:254, n_buffers:256 capabilities: FLOW_STATS TABLE_STATS PORT_STATS GROUP_STATS QUEUE_STATS OFPST_PORT_DESC reply (OF1.3) (xid=0x3): 1(vxlan0): addr:d2:cc:76:b6:c7:bf config: 0 state: 0 speed: 0 Mbps now, 0 Mbps max 2(tun0): addr:1a:89:3b:6e:ec:27 config: 0 state: 0 speed: 0 Mbps now, 0 Mbps max 5(vethfe66b7f6): addr:ce:0a:1d:40:14:c0 config: 0 state: 0 current: 10GB-FD COPPER speed: 10000 Mbps now, 0 Mbps max 18(vethf897b3ae): addr:62:26:5d:46:77:c2 config: 0 state: 0 current: 10GB-FD COPPER speed: 10000 Mbps now, 0 Mbps max 22(veth3f8d481c): addr:82:d5:e2:48:ed:f1 config: 0 state: 0 current: 10GB-FD COPPER speed: 10000 Mbps now, 0 Mbps max 23(veth4a0ef39a): addr:0e:50:24:80:5c:70 config: 0 state: 0 current: 10GB-FD COPPER speed: 10000 Mbps now, 0 Mbps max 24(veth86b0da34): addr:da:0f:63:c0:42:7f config: 0 state: 0 current: 10GB-FD COPPER speed: 10000 Mbps now, 0 Mbps max LOCAL(br0): addr:12:f0:2b:5a:54:44 config: PORT_DOWN state: LINK_DOWN speed: 0 Mbps now, 0 Mbps max OFPT_GET_CONFIG_REPLY (OF1.3) (xid=0x5): frags=normal miss_send_len=0 [root@ip-172-18-4-193 ~]# ====== after restarting openvswitch: [root@ip-172-18-4-193 ~]# ovs-ofctl -O OpenFlow13 show br0 OFPT_FEATURES_REPLY (OF1.3) (xid=0x2): dpid:000012f02b5a5444 n_tables:254, n_buffers:256 capabilities: FLOW_STATS TABLE_STATS PORT_STATS GROUP_STATS QUEUE_STATS OFPST_PORT_DESC reply (OF1.3) (xid=0x3): 1(vxlan0): addr:5a:b5:7a:ba:53:c5 config: 0 state: 0 speed: 0 Mbps now, 0 Mbps max 2(tun0): addr:4a:ff:34:22:0c:8c config: 0 state: 0 speed: 0 Mbps now, 0 Mbps max LOCAL(br0): addr:12:f0:2b:5a:54:44 config: PORT_DOWN state: LINK_DOWN speed: 0 Mbps now, 0 Mbps max OFPT_GET_CONFIG_REPLY (OF1.3) (xid=0x5): frags=normal miss_send_len=0 [root@ip-172-18-4-193 ~]# [root@ip-172-18-4-193 ~]# Expected results: The pod veth should not be removed after restarting openvswitch Additional info: