Description of problem: the least loaded scheduler doesn't work the schedule of router creation is not working well on OVN-L3HA. we take all the gateway chassis list and add it to the list instead of taking the highest priority. we should ponderate by priority (overcloud) [root@controller-0 ~]# ovn-nbctl --db=tcp:172.17.1.15:6641 lrp-get-gateway-chassis lrp-6042c7e2-79b3-4925-b606-b86c6dc1e824 lrp-6042c7e2-79b3-4925-b606-b86c6dc1e824_942750fc-cec5-4a9f-aeb5-6dfddf9be3be 3 lrp-6042c7e2-79b3-4925-b606-b86c6dc1e824_113644ed-b3c6-47f2-9488-984d37936c97 2 lrp-6042c7e2-79b3-4925-b606-b86c6dc1e824_a34f57de-09d3-4c1f-b56b-270eb850537a 1 (overcloud) [root@controller-0 ~]# ovn-nbctl --db=tcp:172.17.1.15:6641 lrp-get-gateway-chassis lrp-284190ed-ff6a-438b-b9ee-a843f13edbd6 lrp-284190ed-ff6a-438b-b9ee-a843f13edbd6_942750fc-cec5-4a9f-aeb5-6dfddf9be3be 3 lrp-284190ed-ff6a-438b-b9ee-a843f13edbd6_113644ed-b3c6-47f2-9488-984d37936c97 2 lrp-284190ed-ff6a-438b-b9ee-a843f13edbd6_a34f57de-09d3-4c1f-b56b-270eb850537a 1 (overcloud) [root@controller-0 ~]# ovn-nbctl --db=tcp:172.17.1.15:6641 show switch 31990f00-c41e-466e-9070-bf3760b58926 (neutron-7b8f0751-6907-408a-8997-89747009fd09) (aka net-64-2) port 6a9c85b2-8a8e-470b-b50f-7ae7c3380b03 type: localport addresses: ["fa:16:3e:85:ae:47 10.0.2.2"] port a0cc0b12-70d5-46c9-8e00-e76e970c711f addresses: ["fa:16:3e:42:d6:89 10.0.2.8"] port 580a8d2c-eaa0-48f0-a7e8-8c379abb8b29 type: router router-port: lrp-580a8d2c-eaa0-48f0-a7e8-8c379abb8b29 switch 7bb30649-71dc-405f-9220-37f7f80f855f (neutron-88236779-29ef-46aa-bc6b-80d8f0f15b45) (aka nova) port 2ae28cbb-8ced-4158-ac3a-7f43cf520ee7 type: localport addresses: ["fa:16:3e:18:b4:cd"] port 6042c7e2-79b3-4925-b606-b86c6dc1e824 type: router router-port: lrp-6042c7e2-79b3-4925-b606-b86c6dc1e824 port 284190ed-ff6a-438b-b9ee-a843f13edbd6 type: router router-port: lrp-284190ed-ff6a-438b-b9ee-a843f13edbd6 port provnet-88236779-29ef-46aa-bc6b-80d8f0f15b45 type: localnet addresses: ["unknown"] switch 26f1fe62-b330-47a6-8527-0d098a2239ac (neutron-6484b473-5e68-440e-9d90-a53e42fe9dc2) (aka net-64-3) port 783de96f-ed69-4d3f-83a3-afa2560a7e02 type: router router-port: lrp-783de96f-ed69-4d3f-83a3-afa2560a7e02 port d12c0cd5-b818-484a-ac0f-70222b15b0cd addresses: ["fa:16:3e:cb:69:c1 10.0.3.9"] port 53afc813-7488-47fe-ba2d-9047577e9ce3 addresses: ["fa:16:3e:33:c2:e6 10.0.3.10"] port accbd0cb-be25-4f96-8e5b-59e3f473871d type: localport addresses: ["fa:16:3e:04:03:20 10.0.3.2"] router ed8829a4-4206-4410-983d-df2e88790121 (neutron-9b83b3ff-e802-4e2a-8c36-1918b6355c7a) (aka Router_eNet_2) port lrp-6042c7e2-79b3-4925-b606-b86c6dc1e824 mac: "fa:16:3e:0a:22:a5" networks: ["10.0.0.220/24"] gateway chassis: [113644ed-b3c6-47f2-9488-984d37936c97 a34f57de-09d3-4c1f-b56b-270eb850537a 942750fc-cec5-4a9f-aeb5-6dfddf9be3be] router 0769eb6f-60ed-451a-af57-8ea56c257fda (neutron-cb989bd4-f821-46b4-b556-b499dd64d5c7) (aka Router_eNet) port lrp-284190ed-ff6a-438b-b9ee-a843f13edbd6 mac: "fa:16:3e:53:26:19" networks: ["10.0.0.214/24"] gateway chassis: [a34f57de-09d3-4c1f-b56b-270eb850537a 113644ed-b3c6-47f2-9488-984d37936c97 942750fc-cec5-4a9f-aeb5-6dfddf9be3be] port lrp-783de96f-ed69-4d3f-83a3-afa2560a7e02 mac: "fa:16:3e:0c:8e:28" networks: ["10.0.3.1/24"] port lrp-580a8d2c-eaa0-48f0-a7e8-8c379abb8b29 mac: "fa:16:3e:c3:0a:b0" networks: ["10.0.2.1/24"] nat 1801d558-fe18-4015-96c7-6998160c64f5 external ip: "10.0.0.218" logical ip: "10.0.3.9" type: "dnat_and_snat" nat 46c19fad-c450-490f-8255-66bb3c1f715f external ip: "10.0.0.214" logical ip: "10.0.2.0/24" type: "snat" nat b81c0ac9-6e19-4beb-88aa-3c1e120fe680 external ip: "10.0.0.215" logical ip: "10.0.2.8" type: "dnat_and_snat" nat dce146ff-354b-4340-9607-49ee78d33be9 external ip: "10.0.0.214" logical ip: "10.0.3.0/24" type: "snat" (overcloud) [root@controller-0 ~]# ovn-sbctl --db=tcp:172.17.1.15:6642 show Chassis "113644ed-b3c6-47f2-9488-984d37936c97" hostname: "controller-2.localdomain" Encap geneve ip: "172.17.2.21" options: {csum="true"} Chassis "50bcbcc8-7f24-4383-9636-81c833ccc345" hostname: "compute-1.localdomain" Encap geneve ip: "172.17.2.18" options: {csum="true"} Port_Binding "d12c0cd5-b818-484a-ac0f-70222b15b0cd" Chassis "942750fc-cec5-4a9f-aeb5-6dfddf9be3be" hostname: "controller-0.localdomain" Encap geneve ip: "172.17.2.20" options: {csum="true"} Port_Binding "cr-lrp-6042c7e2-79b3-4925-b606-b86c6dc1e824" Port_Binding "cr-lrp-284190ed-ff6a-438b-b9ee-a843f13edbd6" Chassis "a34f57de-09d3-4c1f-b56b-270eb850537a" hostname: "controller-1.localdomain" Encap geneve ip: "172.17.2.10" options: {csum="true"} Chassis "0407dcee-65c0-48c3-be8f-e6d7997c7613" hostname: "compute-0.localdomain" Encap geneve ip: "172.17.2.14" options: {csum="true"} Port_Binding "a0cc0b12-70d5-46c9-8e00-e76e970c711f" Version-Release number of selected component (if applicable): (overcloud) [root@controller-0 ~]# rpm -qa | grep ovn puppet-ovn-12.3.1-0.20180221062110.4b16f7c.el7ost.noarch openvswitch-ovn-central-2.9.0-3.el7fdp.x86_64 novnc-0.6.1-1.el7ost.noarch openvswitch-ovn-common-2.9.0-3.el7fdp.x86_64 openvswitch-ovn-host-2.9.0-3.el7fdp.x86_64 python-networking-ovn-metadata-agent-4.0.0-0.20180220131809.329d6d8.el7ost.noarch python-networking-ovn-4.0.0-0.20180220131809.329d6d8.el7ost.noarch ^[[Aopenstack-nova-novncproxy-17.0.1-0.20180302144923.9ace6ed.el7ost.noarch (overcloud) [root@controller-0 ~]# cat /etc/yum.repos.d/latest-installed 13 -p 2018-03-20.2 How reproducible: always Steps to Reproduce: 1. create 2 network 2.create 2 router 3. check with OVN command where the routers scheduled
When I looked to the scheduler, I checked that it's using the least used scheduler, and it's not working as expected. As far as I understood from the code, it's listing all the chassis that have a gw router port scheduled, but it's not taking in account the priorities of the gateway chassis. We should make sure we use the priority in the calculation, otherwise, all the chassis (master or backup) are equally weighted for the calculation.
Merged on master and proposed to stable/queens.
Fix verified: cat /etc/yum.repos.d/latest-installed 13 -p 2018-04-26.3 (overcloud) [root@controller-0 ~]# rpm -qa |grep python-networking-ovn python-networking-ovn-4.0.1-0.20180420150809.c7c16d4.el7ost.noarch python-networking-ovn-metadata-agent-4.0.1-0.20180420150809.c7c16d4.el7ost.noarch I created 3 Routers & verified they scheduled on a different controller node. Now it looks like we are taking into account the priorities of the gateway chassis. Also, run connectivity check to the Router external interface & to instance that attached to the Router. router fa4d44f5-669a-41ce-a0f3-51b127aaf1c0 (neutron-f7df49ce-69be-4fac-b476-22de9ece4cd1) (aka Router_eNet) port lrp-d80d1f0e-a7e2-45bf-854d-6d87246aae48 mac: "fa:16:3e:e4:71:9f" networks: ["10.0.0.217/24"] gateway chassis: [37601a52-d66a-4eac-be13-b9f93095ebf1 21762b93-5d6c-4684-ac52-6018d9d35217 95b77591-d3e9-4a79-b7b6-1e817c4faa48] router c8ac6efa-395d-48d0-906f-1bc4404070a9 (neutron-94c445b3-5912-455f-9708-e90c8ab50b73) (aka Router_eNet_2) port lrp-781c85b7-b59b-4f6e-a7f4-e6f5228f55bf mac: "fa:16:3e:4f:60:ed" networks: ["10.0.0.214/24"] gateway chassis: [37601a52-d66a-4eac-be13-b9f93095ebf1 21762b93-5d6c-4684-ac52-6018d9d35217 95b77591-d3e9-4a79-b7b6-1e817c4faa48] router 33d15737-a35b-4251-9f0b-672a3f52071c (neutron-c7f7ddc8-2a7c-4ecc-8c46-17713a39b9ca) (aka Router_eNet_3) port lrp-95eceb8e-a07f-4868-9c97-388e7da2a3e8 mac: "fa:16:3e:71:c4:0c" networks: ["10.0.0.211/24"] gateway chassis: [21762b93-5d6c-4684-ac52-6018d9d35217 37601a52-d66a-4eac-be13-b9f93095ebf1 95b77591-d3e9-4a79-b7b6-1e817c4faa48] (overcloud) [root@controller-0 ~]# ovn-nbctl lrp-get-gateway-chassis lrp-d80d1f0e-a7e2-45bf-854d-6d87246aae48 lrp-d80d1f0e-a7e2-45bf-854d-6d87246aae48_21762b93-5d6c-4684-ac52-6018d9d35217 3 lrp-d80d1f0e-a7e2-45bf-854d-6d87246aae48_37601a52-d66a-4eac-be13-b9f93095ebf1 2 lrp-d80d1f0e-a7e2-45bf-854d-6d87246aae48_95b77591-d3e9-4a79-b7b6-1e817c4faa48 1 (overcloud) [root@controller-0 ~]# ovn-nbctl lrp-get-gateway-chassis lrp-781c85b7-b59b-4f6e-a7f4-e6f5228f55bf lrp-781c85b7-b59b-4f6e-a7f4-e6f5228f55bf_95b77591-d3e9-4a79-b7b6-1e817c4faa48 3 lrp-781c85b7-b59b-4f6e-a7f4-e6f5228f55bf_37601a52-d66a-4eac-be13-b9f93095ebf1 2 lrp-781c85b7-b59b-4f6e-a7f4-e6f5228f55bf_21762b93-5d6c-4684-ac52-6018d9d35217 1 (overcloud) [root@controller-0 ~]# ovn-nbctl lrp-get-gateway-chassis lrp-95eceb8e-a07f-4868-9c97-388e7da2a3e8 lrp-95eceb8e-a07f-4868-9c97-388e7da2a3e8_37601a52-d66a-4eac-be13-b9f93095ebf1 3 lrp-95eceb8e-a07f-4868-9c97-388e7da2a3e8_21762b93-5d6c-4684-ac52-6018d9d35217 2 lrp-95eceb8e-a07f-4868-9c97-388e7da2a3e8_95b77591-d3e9-4a79-b7b6-1e817c4faa48 1 (overcloud) [root@controller-0 ~]# ping 10.0.0.211 PING 10.0.0.211 (10.0.0.211) 56(84) bytes of data. 64 bytes from 10.0.0.211: icmp_seq=1 ttl=254 time=2.50 ms 64 bytes from 10.0.0.211: icmp_seq=2 ttl=254 time=0.458 ms --- 10.0.0.211 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1006ms rtt min/avg/max/mdev = 0.458/1.483/2.508/1.025 ms (overcloud) [root@controller-0 ~]# ping 10.0.0.214 PING 10.0.0.214 (10.0.0.214) 56(84) bytes of data. 64 bytes from 10.0.0.214: icmp_seq=1 ttl=254 time=0.741 ms 64 bytes from 10.0.0.214: icmp_seq=2 ttl=254 time=0.239 ms ^C --- 10.0.0.214 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1000ms rtt min/avg/max/mdev = 0.239/0.490/0.741/0.251 ms (overcloud) [root@controller-0 ~]# ping 10.0.0.217 PING 10.0.0.217 (10.0.0.217) 56(84) bytes of data. 64 bytes from 10.0.0.217: icmp_seq=1 ttl=254 time=0.969 ms 64 bytes from 10.0.0.217: icmp_seq=2 ttl=254 time=0.397 ms ^C --- 10.0.0.217 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1003ms rtt min/avg/max/mdev = 0.397/0.683/0.969/0.286 ms # ovn-sbctl show Chassis "21762b93-5d6c-4684-ac52-6018d9d35217" hostname: "controller-1.localdomain" Encap geneve ip: "172.17.2.16" options: {csum="true"} Port_Binding "cr-lrp-d80d1f0e-a7e2-45bf-854d-6d87246aae48" Chassis "95b77591-d3e9-4a79-b7b6-1e817c4faa48" hostname: "controller-0.localdomain" Encap geneve ip: "172.17.2.13" options: {csum="true"} Port_Binding "cr-lrp-781c85b7-b59b-4f6e-a7f4-e6f5228f55bf" Chassis "37601a52-d66a-4eac-be13-b9f93095ebf1" hostname: "controller-2.localdomain" Encap geneve ip: "172.17.2.12" options: {csum="true"} Port_Binding "cr-lrp-95eceb8e-a07f-4868-9c97-388e7da2a3e8"
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2086