Bug 1625995
| Summary: | [CI] no connectivity to public addresses via br-ex | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Waldemar Znoinski <wznoinsk> |
| Component: | opendaylight | Assignee: | Mike Kolesnik <mkolesni> |
| Status: | CLOSED DUPLICATE | QA Contact: | Noam Manos <nmanos> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 14.0 (Rocky) | CC: | aadam, abregman, mkolesni, nyechiel |
| Target Milestone: | --- | Keywords: | AutomationBlocker |
| Target Release: | --- | Flags: | abregman:
needinfo-
|
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: |
N/A
|
|
| Last Closed: | 2018-09-17 12:34:45 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1626488 | ||
| Bug Blocks: | |||
Arie, Are you seeing this on your OSP 14 CI jobs as well? This seems to me like a general deployment issue which might not be related to ODL.. more observations: 1. there's no communication of overcloud nodes with any 10.0.0.0/24 (i.e.: undercloud) so tempest can't even start 2. ovs-vswitchd process dies on controllers, logfile: ... 2018-09-06T12:41:42.473Z|00056|connmgr|INFO|br-isolated: added service controller "punix:/var/run/openvswitch/br-isolated.mgmt" 2018-09-06T12:41:42.503Z|00057|rconn|INFO|br-int<->tcp:172.17.1.29:6653: connected 2018-09-06T12:41:42.520Z|00058|bridge|INFO|ovs-vswitchd (Open vSwitch) 2.10.0 2018-09-06T12:41:43.012Z|00059|connmgr|INFO|br-int<->tcp:172.17.1.29:6653: sending OFPGMFC_GROUP_EXISTS error reply to OFPT_GROUP_MOD message 2018-09-06T12:41:43.167Z|00060|rconn|INFO|br-int<->tcp:172.17.1.21:6653: connection timed out 2018-09-06T12:41:43.167Z|00061|rconn|INFO|br-int<->tcp:172.17.1.21:6653: waiting 1 seconds before reconnect 2018-09-06T12:41:43.167Z|00062|rconn|INFO|br-int<->tcp:172.17.1.10:6653: connection timed out 2018-09-06T12:41:43.167Z|00063|rconn|INFO|br-int<->tcp:172.17.1.10:6653: waiting 1 seconds before reconnect 2018-09-06T12:41:44.166Z|00064|rconn|INFO|br-int<->tcp:172.17.1.21:6653: connecting... 2018-09-06T12:41:44.166Z|00065|rconn|INFO|br-int<->tcp:172.17.1.10:6653: connecting... 2018-09-06T12:41:45.166Z|00066|rconn|INFO|br-int<->tcp:172.17.1.21:6653: connection timed out 2018-09-06T12:41:45.166Z|00067|rconn|INFO|br-int<->tcp:172.17.1.21:6653: waiting 2 seconds before reconnect 2018-09-06T12:41:45.166Z|00068|rconn|INFO|br-int<->tcp:172.17.1.10:6653: connection timed out 2018-09-06T12:41:45.166Z|00069|rconn|INFO|br-int<->tcp:172.17.1.10:6653: waiting 2 seconds before reconnect 2018-09-06T12:41:47.167Z|00070|rconn|INFO|br-int<->tcp:172.17.1.21:6653: connecting... 2018-09-06T12:41:47.167Z|00071|rconn|INFO|br-int<->tcp:172.17.1.10:6653: connecting... 2018-09-06T12:41:48.641Z|00001|util(handler29)|EMER|./include/openvswitch/list.h:261: assertion !ovs_list_is_empty(list) failed in ovs_list_back() 3. there's no communication between overcloud nodes themselves on vlan10/20/40/50 subnets, there is communication working on vlan30 after looking at it with Sridhar it looks like: 1. openvswitch 2.10 (in OSP14) behaves differently than previously used 2.9 (OSP13)... 2.10 dies with: util(handler28)|EMER|./include/openvswitch/list.h:261: assertion !ovs_list_is_empty(list) failed in ovs_list_back() reported as https://bugzilla.redhat.com/show_bug.cgi?id=1626488 2. there are "Unexpected exceptions" due to features in ovs2.10 not yet handled by oxygen: https://bugzilla.redhat.com/show_bug.cgi?id=1626497 (may not be directly related to this external connectivity issue) after restarting ovs-vswitchd on controllers the external connectivity is working for a while, the ovs-vswitchd dies again (because of bug 1. above) and no external connectivity again as a test we've tried installing ovs 2.9 (from osp13) instead of 2.10 (osp14) and everything works fine, even after longer period of time *** This bug has been marked as a duplicate of bug 1626488 *** |
Description of problem: after deploying OSP14 + ODL, connectivity from overcloud nodes (i.e.: controller-0) doesn't have connectivity to public addresses controller-0: [root@controller-0 ~]# ip -o a sh br-ex 7: br-ex inet 10.0.0.108/24 brd 10.0.0.255 scope global br-ex\ valid_lft forever preferred_lft forever 7: br-ex inet 10.0.0.101/32 brd 10.0.0.255 scope global br-ex\ valid_lft forever preferred_lft forever 7: br-ex inet6 fe80::5054:ff:fe53:7e69/64 scope link \ valid_lft forever preferred_lft forever [root@controller-0 ~]# ip r default via 10.0.0.1 dev br-ex 10.0.0.0/24 dev br-ex proto kernel scope link src 10.0.0.108 169.254.169.254 via 192.168.24.1 dev eth0 172.17.1.0/24 dev vlan20 proto kernel scope link src 172.17.1.29 172.17.2.0/24 dev vlan50 proto kernel scope link src 172.17.2.24 172.17.3.0/24 dev vlan30 proto kernel scope link src 172.17.3.14 172.17.4.0/24 dev vlan40 proto kernel scope link src 172.17.4.15 172.31.0.0/24 dev docker0 proto kernel scope link src 172.31.0.1 192.168.24.0/24 dev eth0 proto kernel scope link src 192.168.24.15 ping controller-0 -> compute-0 [root@controller-0 ~]# ping -c 3 10.0.0.105 PING 10.0.0.105 (10.0.0.105) 56(84) bytes of data. From 10.0.0.108 icmp_seq=1 Destination Host Unreachable From 10.0.0.108 icmp_seq=2 Destination Host Unreachable From 10.0.0.108 icmp_seq=3 Destination Host Unreachable ping controller-0 -> undercloud-0 [root@controller-0 ~]# ping -c 3 10.0.0.13 PING 10.0.0.13 (10.0.0.13) 56(84) bytes of data. From 10.0.0.108 icmp_seq=1 Destination Host Unreachable From 10.0.0.108 icmp_seq=2 Destination Host Unreachable From 10.0.0.108 icmp_seq=3 Destination Host Unreachable ping controller-0 -> host (physical server controller-0 is a VM on) [root@controller-0 ~]# ping -c 3 10.0.0.1 PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data. From 10.0.0.108 icmp_seq=1 Destination Host Unreachable From 10.0.0.108 icmp_seq=2 Destination Host Unreachable From 10.0.0.108 icmp_seq=3 Destination Host Unreachable the same problem exists on other controllers we don't use 10.0.0.X IPs on computes Version-Release number of selected component (if applicable): osp14 (puddle 2018-08-23.3) + opendaylight-8.3.0-3 How reproducible: 100% Steps to Reproduce: 1. 2. 3. Actual results: ping not working Expected results: ping to work Additional info: [root@controller-0 ~]# ovs-vsctl show d45f7d11-db46-48c0-a7ab-f7d468d85869 Manager "tcp:172.17.1.29:6640" is_connected: true Manager "tcp:172.17.1.10:6640" is_connected: true Manager "tcp:172.17.1.21:6640" is_connected: true Manager "ptcp:6639:127.0.0.1" Bridge br-isolated fail_mode: standalone Port "vlan40" tag: 40 Interface "vlan40" type: internal Port br-isolated Interface br-isolated type: internal Port "eth1" Interface "eth1" Port "vlan20" tag: 20 Interface "vlan20" type: internal Port "vlan30" tag: 30 Interface "vlan30" type: internal Port "vlan50" tag: 50 Interface "vlan50" type: internal Bridge br-ex fail_mode: standalone Port "eth2" Interface "eth2" Port br-ex Interface br-ex type: internal Port br-ex-int-patch Interface br-ex-int-patch type: patch options: {peer=br-ex-patch}