Bug 1566050 - OVN-DVR: SNAT traffic going through the compute cause router scheduled on compute node
Summary: OVN-DVR: SNAT traffic going through the compute cause router scheduled on com...
Keywords:
Status: CLOSED DUPLICATE of bug 1570499
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: beta
: 13.0 (Queens)
Assignee: anil venkata
QA Contact: Eran Kuris
URL:
Whiteboard:
Depends On: 1570499
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-11 12:23 UTC by Eran Kuris
Modified: 2019-09-09 14:23 UTC (History)
10 users (show)

Fixed In Version: openstack-tripleo-heat-templates-8.0.2-0.20180410170331.a39634a.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-04-27 16:50:30 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1766183 0 None None None 2018-04-23 09:11:00 UTC
OpenStack gerrit 559806 0 None MERGED Fix typo in ovn_cms_options config 2020-03-26 15:31:22 UTC
OpenStack gerrit 560858 0 None MERGED Fix typo in ovn_cms_options config 2020-03-26 15:31:22 UTC
OpenStack gerrit 563503 0 None MERGED Add OVNCMSOptions in dvr environment files 2020-03-26 15:31:22 UTC

Description Eran Kuris 2018-04-11 12:23:37 UTC
Description of problem:
Deployed OSP13-HA-OVN setup with DVR enabled.
created Router with an external gateway, tenant network and instance.
When sent SNAT traffic (from VM to the external network - ping 8.8.8.8 for example)  the traffic went through the compute node cause router scheduled on compute node.


]# ovn-nbctl show
switch a92f161d-7fc6-4c5b-ade4-d0e54aa5d250 (neutron-2ff4aa7c-67d2-4e13-b142-a19e1656e601) (aka net-64-2)
    port d66f3e17-3a2a-4373-8a15-022bfb471182
        type: localport
        addresses: ["fa:16:3e:82:f8:8d 10.0.2.2"]
    port 9adb74a6-5907-4dc2-8ad0-300f21374b2b
        type: router
        router-port: lrp-9adb74a6-5907-4dc2-8ad0-300f21374b2b
switch c12c9102-9d25-4a5c-ae28-c745780545e3 (neutron-f5a82fd2-c2a1-47e4-b3a1-4eb4376bba0b) (aka net-64-1)
    port c7a2f493-e713-46bb-87ab-53a3ea32cee2
        type: localport
        addresses: ["fa:16:3e:16:9e:94 10.0.1.2"]
    port ce09abbc-9acb-40da-8d29-b399aae862d1
        type: router
        router-port: lrp-ce09abbc-9acb-40da-8d29-b399aae862d1
switch 403eac5f-ebf9-4024-aca6-637c24b37ec4 (neutron-250c09dc-3cde-480d-b15e-5db4468627fa) (aka nova)
    port 40339056-b966-425f-8d1a-387456c54be5
        type: router
        router-port: lrp-40339056-b966-425f-8d1a-387456c54be5
    port provnet-250c09dc-3cde-480d-b15e-5db4468627fa
        type: localnet
        addresses: ["unknown"]
    port 8608882c-59c8-410c-a5b7-e2042a27e54d
        type: localport
        addresses: ["fa:16:3e:1a:5e:b7"]
router ff4079a7-07ca-4b11-bb09-cac7a3775e9a (neutron-99fb7f82-fa33-4dff-8f6f-c7398d7115c6) (aka Router_eNet)
    port lrp-9adb74a6-5907-4dc2-8ad0-300f21374b2b
        mac: "fa:16:3e:39:85:26"
        networks: ["10.0.2.1/24"]
    port lrp-ce09abbc-9acb-40da-8d29-b399aae862d1
        mac: "fa:16:3e:5b:8c:19"
        networks: ["10.0.1.1/24"]
    port lrp-40339056-b966-425f-8d1a-387456c54be5
        mac: "fa:16:3e:3c:76:c7"
        networks: ["10.0.0.218/24"]
        gateway chassis: [b599e928-3cd6-47b3-b0ae-74be1d692eb8 fd827ad7-c1c2-4227-91b2-273bd1e5aa1f 8b2bf146-682c-448f-ade3-c22296cd0aca d0201d1d-4b46-4a59-a406-eeff6c3848a9 bac0892c-69dd-4523-a6a3-685fed18cfdd]
    nat 5b2e0480-b280-4cc6-a1a6-97aa1b169fb1
        external ip: "10.0.0.218"
        logical ip: "10.0.1.0/24"
        type: "snat"
    nat fbffd009-e17d-4c1b-8848-ad09f86f15f8
        external ip: "10.0.0.218"
        logical ip: "10.0.2.0/24"
        type: "snat"
(overcloud) [root@controller-0 ~]# ping 10.0.0.218
PING 10.0.0.218 (10.0.0.218) 56(84) bytes of data.
64 bytes from 10.0.0.218: icmp_seq=2 ttl=254 time=0.453 ms
64 bytes from 10.0.0.218: icmp_seq=3 ttl=254 time=4.07 ms
^C
--- 10.0.0.218 ping statistics ---
3 packets transmitted, 2 received, 33% packet loss, time 2003ms
rtt min/avg/max/mdev = 0.453/2.265/4.078/1.813 ms
(overcloud) [root@controller-0 ~]# ovn-nbctl list Logical_Router_Port
_uuid               : cbc83a49-0c10-48cb-9f30-390b26a3a457
enabled             : []
external_ids        : {"neutron:revision_number"="5"}
gateway_chassis     : [24e4c877-6c89-474c-a018-41bb49dd38c2, 78022d6e-8085-46bb-9a7e-5a757c97fc03, b9ce47f7-81c4-4f0f-ad36-0650aa4d5178, e3c7eae5-6a18-4475-b1a9-c66876b9c2fc, e4265e61-72d1-400e-9224-1449c16baed7]
ipv6_ra_configs     : {}
mac                 : "fa:16:3e:3c:76:c7"
name                : "lrp-40339056-b966-425f-8d1a-387456c54be5"
networks            : ["10.0.0.218/24"]
options             : {}
peer                : []

_uuid               : 7e2f58ee-a20f-48e6-ae2d-b0c25f7fd9bb
enabled             : []
external_ids        : {"neutron:revision_number"="7"}
gateway_chassis     : []
ipv6_ra_configs     : {}
mac                 : "fa:16:3e:39:85:26"
name                : "lrp-9adb74a6-5907-4dc2-8ad0-300f21374b2b"
networks            : ["10.0.2.1/24"]
options             : {}
peer                : []

_uuid               : a478c626-c4b9-4cba-9f7e-68871ebc09e5
enabled             : []
external_ids        : {"neutron:revision_number"="7"}
gateway_chassis     : []
ipv6_ra_configs     : {}
mac                 : "fa:16:3e:5b:8c:19"
name                : "lrp-ce09abbc-9acb-40da-8d29-b399aae862d1"
networks            : ["10.0.1.1/24"]
options             : {}
peer                : []
(overcloud) [root@controller-0 ~]# ovn-sbctl  show
Chassis "fd827ad7-c1c2-4227-91b2-273bd1e5aa1f"
    hostname: "controller-2.localdomain"
    Encap geneve
        ip: "172.17.2.22"
        options: {csum="true"}
Chassis "8b2bf146-682c-448f-ade3-c22296cd0aca"
    hostname: "compute-1.localdomain"
    Encap geneve
        ip: "172.17.2.11"
        options: {csum="true"}
Chassis "d0201d1d-4b46-4a59-a406-eeff6c3848a9"
    hostname: "controller-0.localdomain"
    Encap geneve
        ip: "172.17.2.10"
        options: {csum="true"}
Chassis "b599e928-3cd6-47b3-b0ae-74be1d692eb8"
    hostname: "controller-1.localdomain"
    Encap geneve
        ip: "172.17.2.14"
        options: {csum="true"}
    Port_Binding "cr-lrp-40339056-b966-425f-8d1a-387456c54be5"
Chassis "bac0892c-69dd-4523-a6a3-685fed18cfdd"
    hostname: "compute-0.localdomain"
    Encap geneve
        ip: "172.17.2.18"
        options: {csum="true"}

Version-Release number of selected component (if applicable):

# cat /etc/yum.repos.d/latest-installed 
13   -p 2018-04-03.3
(overcloud) [root@controller-0 ~]# rpm -qa | grep ovn 
openvswitch-ovn-central-2.9.0-15.el7fdp.x86_64
openvswitch-ovn-common-2.9.0-15.el7fdp.x86_64
python-networking-ovn-4.0.1-0.20180315174741.a57c70e.el7ost.noarch
openvswitch-ovn-host-2.9.0-15.el7fdp.x86_64
openstack-nova-novncproxy-17.0.2-0.20180323024604.0390d5f.el7ost.noarch
novnc-0.6.1-1.el7ost.noarch
python-networking-ovn-metadata-agent-4.0.1-0.20180315174741.a57c70e.el7ost.noarch
puppet-ovn-12.3.1-0.20180221062110.4b16f7c.el7ost.noarch
(overcloud) [root@controller-0 ~]# 

How reproducible:
always

Steps to Reproduce:
1.Deployed OSP13-HA-OVN setup with DVR enabled.
2. created Router with an external gateway, tenant network and instance.

3. do not assign FIP to the vm 
4. connect to vm via console and png to google dns 8.8.8.8
5. take tcpdump on eth of the compute node you can see that snat traffic went through the compute instead of controller node. 
Actual results:


Expected results:


Additional info:

Comment 1 anil venkata 2018-04-11 12:26:54 UTC
Proposed a fix in u/s https://review.openstack.org/#/c/559806/

Comment 6 Eran Kuris 2018-04-22 13:07:21 UTC
Hi Anil,

Its look like that SNAT traffic going through 1 compute node.
I have 2 instances one on each compute node. Both instances going through 
compute -1.
The expectation is each instance will go through his controller node

Expected:
                      
 ---------                           
|  VM1   |  ----> go through controller node
 ---------     
Compute-0


 ---------
|  VM2   |  -----> go through controller node
 ---------
Compute-1


Actual:

 ---------                           
|  VM1   |  
 ---------     
Compute-0
    |
    |
    V
 ---------
|  VM2   |  -----> br-ex
 ---------
Compute-1

Comment 7 anil venkata 2018-04-23 06:34:59 UTC
Thanks Eran.

In overcloud_deploy.sh, we are using template file [1] ,  but this file is not having 

OVNCMSOptions: "enable-chassis-as-gw"

as per the commit https://github.com/openstack/tripleo-heat-templates/commit/71d59bb0a34349f3ed2b95d70452b771cc8039d2#diff-bdb9af0031906100cdbc39af8dfa1e6e

May be adding this to [1] helps resolve this issue or 
in ovn deployment, as this must be enabled by default in a controller node, can we add it to a generic template which runs in controller node?

[1] /usr/share/openstack-tripleo-heat-templates/environments/services-docker/neutron-ovn-dvr-ha.yaml

Comment 8 Daniel Alvarez Sanchez 2018-04-27 16:50:30 UTC

*** This bug has been marked as a duplicate of bug 1570499 ***


Note You need to log in before you can comment on or make changes to this bug.