Description of problem: Deployed OSP13-HA-OVN setup with DVR enabled. created Router with an external gateway, tenant network and instance. When sent SNAT traffic (from VM to the external network - ping 8.8.8.8 for example) the traffic went through the compute node cause router scheduled on compute node. ]# ovn-nbctl show switch a92f161d-7fc6-4c5b-ade4-d0e54aa5d250 (neutron-2ff4aa7c-67d2-4e13-b142-a19e1656e601) (aka net-64-2) port d66f3e17-3a2a-4373-8a15-022bfb471182 type: localport addresses: ["fa:16:3e:82:f8:8d 10.0.2.2"] port 9adb74a6-5907-4dc2-8ad0-300f21374b2b type: router router-port: lrp-9adb74a6-5907-4dc2-8ad0-300f21374b2b switch c12c9102-9d25-4a5c-ae28-c745780545e3 (neutron-f5a82fd2-c2a1-47e4-b3a1-4eb4376bba0b) (aka net-64-1) port c7a2f493-e713-46bb-87ab-53a3ea32cee2 type: localport addresses: ["fa:16:3e:16:9e:94 10.0.1.2"] port ce09abbc-9acb-40da-8d29-b399aae862d1 type: router router-port: lrp-ce09abbc-9acb-40da-8d29-b399aae862d1 switch 403eac5f-ebf9-4024-aca6-637c24b37ec4 (neutron-250c09dc-3cde-480d-b15e-5db4468627fa) (aka nova) port 40339056-b966-425f-8d1a-387456c54be5 type: router router-port: lrp-40339056-b966-425f-8d1a-387456c54be5 port provnet-250c09dc-3cde-480d-b15e-5db4468627fa type: localnet addresses: ["unknown"] port 8608882c-59c8-410c-a5b7-e2042a27e54d type: localport addresses: ["fa:16:3e:1a:5e:b7"] router ff4079a7-07ca-4b11-bb09-cac7a3775e9a (neutron-99fb7f82-fa33-4dff-8f6f-c7398d7115c6) (aka Router_eNet) port lrp-9adb74a6-5907-4dc2-8ad0-300f21374b2b mac: "fa:16:3e:39:85:26" networks: ["10.0.2.1/24"] port lrp-ce09abbc-9acb-40da-8d29-b399aae862d1 mac: "fa:16:3e:5b:8c:19" networks: ["10.0.1.1/24"] port lrp-40339056-b966-425f-8d1a-387456c54be5 mac: "fa:16:3e:3c:76:c7" networks: ["10.0.0.218/24"] gateway chassis: [b599e928-3cd6-47b3-b0ae-74be1d692eb8 fd827ad7-c1c2-4227-91b2-273bd1e5aa1f 8b2bf146-682c-448f-ade3-c22296cd0aca d0201d1d-4b46-4a59-a406-eeff6c3848a9 bac0892c-69dd-4523-a6a3-685fed18cfdd] nat 5b2e0480-b280-4cc6-a1a6-97aa1b169fb1 external ip: "10.0.0.218" logical ip: "10.0.1.0/24" type: "snat" nat fbffd009-e17d-4c1b-8848-ad09f86f15f8 external ip: "10.0.0.218" logical ip: "10.0.2.0/24" type: "snat" (overcloud) [root@controller-0 ~]# ping 10.0.0.218 PING 10.0.0.218 (10.0.0.218) 56(84) bytes of data. 64 bytes from 10.0.0.218: icmp_seq=2 ttl=254 time=0.453 ms 64 bytes from 10.0.0.218: icmp_seq=3 ttl=254 time=4.07 ms ^C --- 10.0.0.218 ping statistics --- 3 packets transmitted, 2 received, 33% packet loss, time 2003ms rtt min/avg/max/mdev = 0.453/2.265/4.078/1.813 ms (overcloud) [root@controller-0 ~]# ovn-nbctl list Logical_Router_Port _uuid : cbc83a49-0c10-48cb-9f30-390b26a3a457 enabled : [] external_ids : {"neutron:revision_number"="5"} gateway_chassis : [24e4c877-6c89-474c-a018-41bb49dd38c2, 78022d6e-8085-46bb-9a7e-5a757c97fc03, b9ce47f7-81c4-4f0f-ad36-0650aa4d5178, e3c7eae5-6a18-4475-b1a9-c66876b9c2fc, e4265e61-72d1-400e-9224-1449c16baed7] ipv6_ra_configs : {} mac : "fa:16:3e:3c:76:c7" name : "lrp-40339056-b966-425f-8d1a-387456c54be5" networks : ["10.0.0.218/24"] options : {} peer : [] _uuid : 7e2f58ee-a20f-48e6-ae2d-b0c25f7fd9bb enabled : [] external_ids : {"neutron:revision_number"="7"} gateway_chassis : [] ipv6_ra_configs : {} mac : "fa:16:3e:39:85:26" name : "lrp-9adb74a6-5907-4dc2-8ad0-300f21374b2b" networks : ["10.0.2.1/24"] options : {} peer : [] _uuid : a478c626-c4b9-4cba-9f7e-68871ebc09e5 enabled : [] external_ids : {"neutron:revision_number"="7"} gateway_chassis : [] ipv6_ra_configs : {} mac : "fa:16:3e:5b:8c:19" name : "lrp-ce09abbc-9acb-40da-8d29-b399aae862d1" networks : ["10.0.1.1/24"] options : {} peer : [] (overcloud) [root@controller-0 ~]# ovn-sbctl show Chassis "fd827ad7-c1c2-4227-91b2-273bd1e5aa1f" hostname: "controller-2.localdomain" Encap geneve ip: "172.17.2.22" options: {csum="true"} Chassis "8b2bf146-682c-448f-ade3-c22296cd0aca" hostname: "compute-1.localdomain" Encap geneve ip: "172.17.2.11" options: {csum="true"} Chassis "d0201d1d-4b46-4a59-a406-eeff6c3848a9" hostname: "controller-0.localdomain" Encap geneve ip: "172.17.2.10" options: {csum="true"} Chassis "b599e928-3cd6-47b3-b0ae-74be1d692eb8" hostname: "controller-1.localdomain" Encap geneve ip: "172.17.2.14" options: {csum="true"} Port_Binding "cr-lrp-40339056-b966-425f-8d1a-387456c54be5" Chassis "bac0892c-69dd-4523-a6a3-685fed18cfdd" hostname: "compute-0.localdomain" Encap geneve ip: "172.17.2.18" options: {csum="true"} Version-Release number of selected component (if applicable): # cat /etc/yum.repos.d/latest-installed 13 -p 2018-04-03.3 (overcloud) [root@controller-0 ~]# rpm -qa | grep ovn openvswitch-ovn-central-2.9.0-15.el7fdp.x86_64 openvswitch-ovn-common-2.9.0-15.el7fdp.x86_64 python-networking-ovn-4.0.1-0.20180315174741.a57c70e.el7ost.noarch openvswitch-ovn-host-2.9.0-15.el7fdp.x86_64 openstack-nova-novncproxy-17.0.2-0.20180323024604.0390d5f.el7ost.noarch novnc-0.6.1-1.el7ost.noarch python-networking-ovn-metadata-agent-4.0.1-0.20180315174741.a57c70e.el7ost.noarch puppet-ovn-12.3.1-0.20180221062110.4b16f7c.el7ost.noarch (overcloud) [root@controller-0 ~]# How reproducible: always Steps to Reproduce: 1.Deployed OSP13-HA-OVN setup with DVR enabled. 2. created Router with an external gateway, tenant network and instance. 3. do not assign FIP to the vm 4. connect to vm via console and png to google dns 8.8.8.8 5. take tcpdump on eth of the compute node you can see that snat traffic went through the compute instead of controller node. Actual results: Expected results: Additional info:
Proposed a fix in u/s https://review.openstack.org/#/c/559806/
Hi Anil, Its look like that SNAT traffic going through 1 compute node. I have 2 instances one on each compute node. Both instances going through compute -1. The expectation is each instance will go through his controller node Expected: --------- | VM1 | ----> go through controller node --------- Compute-0 --------- | VM2 | -----> go through controller node --------- Compute-1 Actual: --------- | VM1 | --------- Compute-0 | | V --------- | VM2 | -----> br-ex --------- Compute-1
Thanks Eran. In overcloud_deploy.sh, we are using template file [1] , but this file is not having OVNCMSOptions: "enable-chassis-as-gw" as per the commit https://github.com/openstack/tripleo-heat-templates/commit/71d59bb0a34349f3ed2b95d70452b771cc8039d2#diff-bdb9af0031906100cdbc39af8dfa1e6e May be adding this to [1] helps resolve this issue or in ovn deployment, as this must be enabled by default in a controller node, can we add it to a generic template which runs in controller node? [1] /usr/share/openstack-tripleo-heat-templates/environments/services-docker/neutron-ovn-dvr-ha.yaml
*** This bug has been marked as a duplicate of bug 1570499 ***