Bug 1562731
Summary: | OVN L3HA when creating 2 routers they scheduled to same controller node | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Eran Kuris <ekuris> |
Component: | python-networking-ovn | Assignee: | Daniel Alvarez Sanchez <dalvarez> |
Status: | CLOSED ERRATA | QA Contact: | Eran Kuris <ekuris> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 13.0 (Queens) | CC: | amuller, apevec, bcafarel, dalvarez, jschluet, lhh, majopela, nyechiel, samccann |
Target Milestone: | beta | Keywords: | Triaged |
Target Release: | 13.0 (Queens) | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | python-networking-ovn-4.0.1-0.20180420150809.c7c16d4.el7ost | Doc Type: | Bug Fix |
Doc Text: |
The current L3 HA scheduler was not taking the priorities of the nodes into consideration. Therefore, all gateways were being hosted by the same node and the load was not distributed across candidates.
This fix implements an algorithm to select the least loaded node when scheduling a gateway router. Gateway ports are now being scheduled on the least loaded network node distributing the load evenly across them.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2018-06-27 13:49:35 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Eran Kuris
2018-04-02 09:39:26 UTC
When I looked to the scheduler, I checked that it's using the least used scheduler, and it's not working as expected. As far as I understood from the code, it's listing all the chassis that have a gw router port scheduled, but it's not taking in account the priorities of the gateway chassis. We should make sure we use the priority in the calculation, otherwise, all the chassis (master or backup) are equally weighted for the calculation. Merged on master and proposed to stable/queens. Fix verified: cat /etc/yum.repos.d/latest-installed 13 -p 2018-04-26.3 (overcloud) [root@controller-0 ~]# rpm -qa |grep python-networking-ovn python-networking-ovn-4.0.1-0.20180420150809.c7c16d4.el7ost.noarch python-networking-ovn-metadata-agent-4.0.1-0.20180420150809.c7c16d4.el7ost.noarch I created 3 Routers & verified they scheduled on a different controller node. Now it looks like we are taking into account the priorities of the gateway chassis. Also, run connectivity check to the Router external interface & to instance that attached to the Router. router fa4d44f5-669a-41ce-a0f3-51b127aaf1c0 (neutron-f7df49ce-69be-4fac-b476-22de9ece4cd1) (aka Router_eNet) port lrp-d80d1f0e-a7e2-45bf-854d-6d87246aae48 mac: "fa:16:3e:e4:71:9f" networks: ["10.0.0.217/24"] gateway chassis: [37601a52-d66a-4eac-be13-b9f93095ebf1 21762b93-5d6c-4684-ac52-6018d9d35217 95b77591-d3e9-4a79-b7b6-1e817c4faa48] router c8ac6efa-395d-48d0-906f-1bc4404070a9 (neutron-94c445b3-5912-455f-9708-e90c8ab50b73) (aka Router_eNet_2) port lrp-781c85b7-b59b-4f6e-a7f4-e6f5228f55bf mac: "fa:16:3e:4f:60:ed" networks: ["10.0.0.214/24"] gateway chassis: [37601a52-d66a-4eac-be13-b9f93095ebf1 21762b93-5d6c-4684-ac52-6018d9d35217 95b77591-d3e9-4a79-b7b6-1e817c4faa48] router 33d15737-a35b-4251-9f0b-672a3f52071c (neutron-c7f7ddc8-2a7c-4ecc-8c46-17713a39b9ca) (aka Router_eNet_3) port lrp-95eceb8e-a07f-4868-9c97-388e7da2a3e8 mac: "fa:16:3e:71:c4:0c" networks: ["10.0.0.211/24"] gateway chassis: [21762b93-5d6c-4684-ac52-6018d9d35217 37601a52-d66a-4eac-be13-b9f93095ebf1 95b77591-d3e9-4a79-b7b6-1e817c4faa48] (overcloud) [root@controller-0 ~]# ovn-nbctl lrp-get-gateway-chassis lrp-d80d1f0e-a7e2-45bf-854d-6d87246aae48 lrp-d80d1f0e-a7e2-45bf-854d-6d87246aae48_21762b93-5d6c-4684-ac52-6018d9d35217 3 lrp-d80d1f0e-a7e2-45bf-854d-6d87246aae48_37601a52-d66a-4eac-be13-b9f93095ebf1 2 lrp-d80d1f0e-a7e2-45bf-854d-6d87246aae48_95b77591-d3e9-4a79-b7b6-1e817c4faa48 1 (overcloud) [root@controller-0 ~]# ovn-nbctl lrp-get-gateway-chassis lrp-781c85b7-b59b-4f6e-a7f4-e6f5228f55bf lrp-781c85b7-b59b-4f6e-a7f4-e6f5228f55bf_95b77591-d3e9-4a79-b7b6-1e817c4faa48 3 lrp-781c85b7-b59b-4f6e-a7f4-e6f5228f55bf_37601a52-d66a-4eac-be13-b9f93095ebf1 2 lrp-781c85b7-b59b-4f6e-a7f4-e6f5228f55bf_21762b93-5d6c-4684-ac52-6018d9d35217 1 (overcloud) [root@controller-0 ~]# ovn-nbctl lrp-get-gateway-chassis lrp-95eceb8e-a07f-4868-9c97-388e7da2a3e8 lrp-95eceb8e-a07f-4868-9c97-388e7da2a3e8_37601a52-d66a-4eac-be13-b9f93095ebf1 3 lrp-95eceb8e-a07f-4868-9c97-388e7da2a3e8_21762b93-5d6c-4684-ac52-6018d9d35217 2 lrp-95eceb8e-a07f-4868-9c97-388e7da2a3e8_95b77591-d3e9-4a79-b7b6-1e817c4faa48 1 (overcloud) [root@controller-0 ~]# ping 10.0.0.211 PING 10.0.0.211 (10.0.0.211) 56(84) bytes of data. 64 bytes from 10.0.0.211: icmp_seq=1 ttl=254 time=2.50 ms 64 bytes from 10.0.0.211: icmp_seq=2 ttl=254 time=0.458 ms --- 10.0.0.211 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1006ms rtt min/avg/max/mdev = 0.458/1.483/2.508/1.025 ms (overcloud) [root@controller-0 ~]# ping 10.0.0.214 PING 10.0.0.214 (10.0.0.214) 56(84) bytes of data. 64 bytes from 10.0.0.214: icmp_seq=1 ttl=254 time=0.741 ms 64 bytes from 10.0.0.214: icmp_seq=2 ttl=254 time=0.239 ms ^C --- 10.0.0.214 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1000ms rtt min/avg/max/mdev = 0.239/0.490/0.741/0.251 ms (overcloud) [root@controller-0 ~]# ping 10.0.0.217 PING 10.0.0.217 (10.0.0.217) 56(84) bytes of data. 64 bytes from 10.0.0.217: icmp_seq=1 ttl=254 time=0.969 ms 64 bytes from 10.0.0.217: icmp_seq=2 ttl=254 time=0.397 ms ^C --- 10.0.0.217 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1003ms rtt min/avg/max/mdev = 0.397/0.683/0.969/0.286 ms # ovn-sbctl show Chassis "21762b93-5d6c-4684-ac52-6018d9d35217" hostname: "controller-1.localdomain" Encap geneve ip: "172.17.2.16" options: {csum="true"} Port_Binding "cr-lrp-d80d1f0e-a7e2-45bf-854d-6d87246aae48" Chassis "95b77591-d3e9-4a79-b7b6-1e817c4faa48" hostname: "controller-0.localdomain" Encap geneve ip: "172.17.2.13" options: {csum="true"} Port_Binding "cr-lrp-781c85b7-b59b-4f6e-a7f4-e6f5228f55bf" Chassis "37601a52-d66a-4eac-be13-b9f93095ebf1" hostname: "controller-2.localdomain" Encap geneve ip: "172.17.2.12" options: {csum="true"} Port_Binding "cr-lrp-95eceb8e-a07f-4868-9c97-388e7da2a3e8" Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2086 |