Bug 1851731

Summary: OVN-DVR OSP 16 with geneve tenant network and flat public network tries ARP resolution of IPs that are off subnet
Product: Red Hat OpenStack Reporter: Andreas Karis <akaris>
Component: openstack-neutronAssignee: Assaf Muller <amuller>
Status: CLOSED DUPLICATE QA Contact: Eran Kuris <ekuris>
Severity: high Docs Contact:
Priority: unspecified    
Version: 16.0 (Train)CC: amuller, chrisw, fshaikh, jlibosva, rsafrono, scohen, suchaudh
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-06-30 11:34:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Andreas Karis 2020-06-28 13:20:07 UTC
Observation

OVN-DVR OSP 16 with geneve tenant network and flat public network

With DVR enabled, OVN tries to ARP resolve everything locally, even IPs not on it's subnet:
~~~
[root@controller-0 ~]# egrep 'dvr|distr' /var/lib/config-data/puppet-generated/neutron -R
grep: /var/lib/config-data/puppet-generated/neutron/etc/neutron/plugin.ini: No such file or directory
/var/lib/config-data/puppet-generated/neutron/etc/neutron/plugins/ml2/ml2_conf.ini:enable_distributed_floating_ip=true
/var/lib/config-data/puppet-generated/neutron/etc/neutron/neutron.conf:router_distributed=true
/var/lib/config-data/puppet-generated/neutron/etc/neutron/neutron.conf:enable_dvr=true
/var/lib/config-data/puppet-generated/neutron/etc/neutron/neutron.conf:# distributions. (string value)
/var/lib/config-data/puppet-generated/neutron/etc/neutron/neutron.conf:# distributions. (string value)
~~~

~~~
[root@compute-0 ~]# ovn-nbctl find NAT type=dnat_and_snat | tail
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
type                : dnat_and_snat

_uuid               : 9943e8da-bf62-4bf7-bc90-2d1a0078d406
external_ids        : {"neutron:fip_external_mac"="fa:16:3e:41:39:95", "neutron:fip_id"="9e37f80c-5032-4dac-8039-f9b08d090031", "neutron:fip_port_id"="28583bb2-b290-4b8d-9c5d-8874257c669b", "neutron:revision_number"="10", "neutron:router_name"="neutron-ceae8713-694a-4d58-af85-cf44993ef0af"}
external_ip         : "10.0.0.123"
external_mac        : "fa:16:3e:41:39:95"
logical_ip          : "192.168.0.135"
logical_port        : "28583bb2-b290-4b8d-9c5d-8874257c669b"
options             : {}
type                : dnat_and_snat
~~~
external_mac set -> dvr

~~~
[root@compute-0 ~]# tcpdump -nne -i ens5 host 10.74.253.161
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens5, link-type EN10MB (Ethernet), capture size 262144 bytes
13:13:54.910973 fa:16:3e:41:39:95 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 10.74.253.161 tell 10.0.0.123, length 28
13:13:59.916972 fa:16:3e:41:39:95 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 10.74.253.161 tell 10.0.0.123, length 28
~~~

~~~
[root@controller-0 ~]# tcpdump -nne -i ens5 host 10.74.253.161
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens5, link-type EN10MB (Ethernet), capture size 262144 bytes
13:13:54.911025 fa:16:3e:41:39:95 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 10.74.253.161 tell 10.0.0.123, length 28
13:13:59.917033 fa:16:3e:41:39:95 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 10.74.253.161 tell 10.0.0.123, length 28
~~~

~~~
(overcloud) [stack@undercloud-0 ~]$ openstack server remove floating ip test 10.0.0.123
(overcloud) [stack@undercloud-0 ~]$ openstack server add floating ip test 10.0.0.123
(overcloud) [stack@undercloud-0 ~]$ ssh cloud-user.0.123
Warning: Permanently added '10.0.0.123' (ECDSA) to the list of known hosts.
Last login: Sun Jun 28 09:05:53 2020 from 10.0.0.87
[cloud-user@test ~]$ sudo -i
[root@test ~]# ping google.com
^C
[root@test ~]# ^C
[root@test ~]# ^C
[root@test ~]# exit
logout
[cloud-user@test ~]$ exit
logout
Connection to 10.0.0.123 closed.
~~~

As soon as I switch this to not to use DVR, this works fine:
~~~
[root@controller-0 ~]# egrep 'dvr|distr' /var/lib/config-data/puppet-generated/neutron -R
grep: /var/lib/config-data/puppet-generated/neutron/etc/neutron/plugin.ini: No such file or directory
/var/lib/config-data/puppet-generated/neutron/etc/neutron/plugins/ml2/ml2_conf.ini:enable_distributed_floating_ip=false
/var/lib/config-data/puppet-generated/neutron/etc/neutron/neutron.conf:router_distributed=false
/var/lib/config-data/puppet-generated/neutron/etc/neutron/neutron.conf:enable_dvr=false
/var/lib/config-data/puppet-generated/neutron/etc/neutron/neutron.conf:# distributions. (string value)
/var/lib/config-data/puppet-generated/neutron/etc/neutron/neutron.conf:# distributions. (string value)
[root@controller-0 ~]# podman restart neutron_api
~~~

~~~
(overcloud) [stack@undercloud-0 ~]$ openstack server remove floating ip test 10.0.0.123
(overcloud) [stack@undercloud-0 ~]$ openstack server add floating ip test 10.0.0.123
(overcloud) [stack@undercloud-0 ~]$ ssh cloud-user.0.123
Warning: Permanently added '10.0.0.123' (ECDSA) to the list of known hosts.
Last login: Sun Jun 28 09:14:04 2020 from 10.0.0.87
(reverse-i-search)`': ^C
[cloud-user@test ~]$ ping google.com
PING google.com (172.217.166.174) 56(84) bytes of data.
^C
--- google.com ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

[cloud-user@test ~]$ ping google.com
PING google.com (172.217.166.174) 56(84) bytes of data.
64 bytes from bom07s20-in-f14.1e100.net (172.217.166.174): icmp_seq=1 ttl=102 time=304 ms
^C
--- google.com ping statistics ---
2 packets transmitted, 1 received, 50% packet loss, time 1000ms
rtt min/avg/max/mdev = 304.507/304.507/304.507/0.000 ms
[cloud-user@test ~]$ 
~~~

~~~
[root@controller-0 ~]# tcpdump -nne -i ens5 host 10.74.253.161
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens5, link-type EN10MB (Ethernet), capture size 262144 bytes
13:16:37.677013 fa:16:3e:d2:58:d3 > 52:54:00:f4:6d:fe, ethertype IPv4 (0x0800), length 82: 10.0.0.123.33918 > 10.74.253.161.53: 12484+ PTR? 87.0.0.10.in-addr.arpa. (40)
13:16:38.066089 52:54:00:f4:6d:fe > fa:16:3e:d2:58:d3, ethertype IPv4 (0x0800), length 152: 10.74.253.161.53 > 10.0.0.123.33918: 12484 NXDomain 0/1/0 (110)

~~~

~~~
[root@compute-0 ~]# ovn-nbctl find NAT type=dnat_and_snat | tail
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
type                : dnat_and_snat

_uuid               : 5fa69397-97d6-4f13-a446-0828a98fefbc
external_ids        : {"neutron:fip_id"="9e37f80c-5032-4dac-8039-f9b08d090031", "neutron:fip_port_id"="28583bb2-b290-4b8d-9c5d-8874257c669b", "neutron:revision_number"="14", "neutron:router_name"="neutron-ceae8713-694a-4d58-af85-cf44993ef0af"}
external_ip         : "10.0.0.123"
external_mac        : []
logical_ip          : "192.168.0.135"
logical_port        : []
options             : {}
type                : dnat_and_snat
[root@compute-0 ~]# 
~~~

sosreports working:

Comment 1 Andreas Karis 2020-06-28 13:23:35 UTC

(overcloud) [stack@undercloud-0 shift]$ neutron net-list
nneutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
eutron subn+--------------------------------------+----------------------------+----------------------------------+-----------------------------------------------------+
| id                                   | name                       | tenant_id                        | subnets                                             |
+--------------------------------------+----------------------------+----------------------------------+-----------------------------------------------------+
| 00249ba5-fb05-4f26-84ed-09af3fc6c66f | stackshift-2hpdv-openshift | 969aabe81e0749e599daf64c874abbcb | ada65210-f9cf-4c2e-805e-539ff2124678 10.0.0.0/16    |
| 6dc39a72-1bfd-41ae-9906-aed6d13508e0 | public                     | 969aabe81e0749e599daf64c874abbcb | 26231d56-2d52-41eb-95bb-c1dce14f0f00 10.0.0.0/24    |
| bc48bd41-f873-4b9a-9207-e56612123176 | private-test               | 969aabe81e0749e599daf64c874abbcb | b9d1aea2-b1e7-4b08-aa81-b63e3c6c719b 192.168.0.0/24 |
+--------------------------------------+----------------------------+----------------------------------+-----------------------------------------------------+
e(overcloud) [stack@undercloud-0 shift]$ neutron subnet-list
nneutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
eutron +--------------------------------------+------------------------+----------------------------------+----------------+----------------------------------------------------+
| id                                   | name                   | tenant_id                        | cidr           | allocation_pools                                   |
+--------------------------------------+------------------------+----------------------------------+----------------+----------------------------------------------------+
| 26231d56-2d52-41eb-95bb-c1dce14f0f00 | sub-public             | 969aabe81e0749e599daf64c874abbcb | 10.0.0.0/24    | {"start": "10.0.0.2", "end": "10.0.0.254"}         |
| ada65210-f9cf-4c2e-805e-539ff2124678 | stackshift-2hpdv-nodes | 969aabe81e0749e599daf64c874abbcb | 10.0.0.0/16    | {"start": "10.0.0.10", "end": "10.0.62.128"}       |
| b9d1aea2-b1e7-4b08-aa81-b63e3c6c719b | private-test-subnet    | 969aabe81e0749e599daf64c874abbcb | 192.168.0.0/24 | {"start": "192.168.0.100", "end": "192.168.0.150"} |
+--------------------------------------+------------------------+----------------------------------+----------------+----------------------------------------------------+
(overcloud) [stack@undercloud-0 shift]$ neutron net-show public
neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
+---------------------------+--------------------------------------+
| Field                     | Value                                |
+---------------------------+--------------------------------------+
| admin_state_up            | True                                 |
| availability_zone_hints   |                                      |
| availability_zones        |                                      |
| created_at                | 2020-06-07T11:47:41Z                 |
| description               |                                      |
| dns_domain                |                                      |
| id                        | 6dc39a72-1bfd-41ae-9906-aed6d13508e0 |
| ipv4_address_scope        |                                      |
| ipv6_address_scope        |                                      |
| is_default                | False                                |
| l2_adjacency              | True                                 |
| mtu                       | 1500                                 |
| name                      | public                               |
| port_security_enabled     | True                                 |
| project_id                | 969aabe81e0749e599daf64c874abbcb     |
| provider:network_type     | flat                                 |
| provider:physical_network | datacentre                           |
| provider:segmentation_id  |                                      |
| qos_policy_id             |                                      |
| revision_number           | 2                                    |
| router:external           | True                                 |
| shared                    | False                                |
| status                    | ACTIVE                               |
| subnets                   | 26231d56-2d52-41eb-95bb-c1dce14f0f00 |
| tags                      |                                      |
| tenant_id                 | 969aabe81e0749e599daf64c874abbcb     |
| updated_at                | 2020-06-07T11:47:49Z                 |
+---------------------------+--------------------------------------+
(overcloud) [stack@undercloud-0 shift]$ neutron subnet-show public-subnet
neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
Unable to find subnet with name or id 'public-subnet'
(overcloud) [stack@undercloud-0 shift]$ neutron subnet-show sub-public
neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
+-------------------+--------------------------------------------+
| Field             | Value                                      |
+-------------------+--------------------------------------------+
| allocation_pools  | {"start": "10.0.0.2", "end": "10.0.0.254"} |
| cidr              | 10.0.0.0/24                                |
| created_at        | 2020-06-07T11:47:49Z                       |
| description       |                                            |
| dns_nameservers   |                                            |
| enable_dhcp       | True                                       |
| gateway_ip        | 10.0.0.1                                   |
| host_routes       |                                            |
| id                | 26231d56-2d52-41eb-95bb-c1dce14f0f00       |
| ip_version        | 4                                          |
| ipv6_address_mode |                                            |
| ipv6_ra_mode      |                                            |
| name              | sub-public                                 |
| network_id        | 6dc39a72-1bfd-41ae-9906-aed6d13508e0       |
| project_id        | 969aabe81e0749e599daf64c874abbcb           |
| revision_number   | 0                                          |
| segment_id        |                                            |
| service_types     |                                            |
| subnetpool_id     |                                            |
| tags              |                                            |
| tenant_id         | 969aabe81e0749e599daf64c874abbcb           |
| updated_at        | 2020-06-07T11:47:49Z                       |
+-------------------+--------------------------------------------+
(overcloud) [stack@undercloud-0 shift]$ neutron net-show private
neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
Unable to find network with name or id 'private'
(overcloud) [stack@undercloud-0 shift]$ neutron net-show private-test
neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
+---------------------------+--------------------------------------+
| Field                     | Value                                |
+---------------------------+--------------------------------------+
| admin_state_up            | True                                 |
| availability_zone_hints   |                                      |
| availability_zones        |                                      |
| created_at                | 2020-06-28T12:47:29Z                 |
| description               |                                      |
| dns_domain                |                                      |
| id                        | bc48bd41-f873-4b9a-9207-e56612123176 |
| ipv4_address_scope        |                                      |
| ipv6_address_scope        |                                      |
| l2_adjacency              | True                                 |
| mtu                       | 1442                                 |
| name                      | private-test                         |
| port_security_enabled     | True                                 |
| project_id                | 969aabe81e0749e599daf64c874abbcb     |
| provider:network_type     | geneve                               |
| provider:physical_network |                                      |
| provider:segmentation_id  | 1                                    |
| qos_policy_id             |                                      |
| revision_number           | 2                                    |
| router:external           | False                                |
| shared                    | False                                |
| status                    | ACTIVE                               |
| subnets                   | b9d1aea2-b1e7-4b08-aa81-b63e3c6c719b |
| tags                      |                                      |
| tenant_id                 | 969aabe81e0749e599daf64c874abbcb     |
| updated_at                | 2020-06-28T12:49:41Z                 |
+---------------------------+--------------------------------------+
(overcloud) [stack@undercloud-0 shift]$ neutron subnet-show !$-subnet
neutron subnet-show private-test-subnet
neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
+-------------------+----------------------------------------------------+
| Field             | Value                                              |
+-------------------+----------------------------------------------------+
| allocation_pools  | {"start": "192.168.0.100", "end": "192.168.0.150"} |
| cidr              | 192.168.0.0/24                                     |
| created_at        | 2020-06-28T12:49:41Z                               |
| description       |                                                    |
| dns_nameservers   |                                                    |
| enable_dhcp       | True                                               |
| gateway_ip        | 192.168.0.1                                        |
| host_routes       |                                                    |
| id                | b9d1aea2-b1e7-4b08-aa81-b63e3c6c719b               |
| ip_version        | 4                                                  |
| ipv6_address_mode |                                                    |
| ipv6_ra_mode      |                                                    |
| name              | private-test-subnet                                |
| network_id        | bc48bd41-f873-4b9a-9207-e56612123176               |
| project_id        | 969aabe81e0749e599daf64c874abbcb                   |
| revision_number   | 0                                                  |
| segment_id        |                                                    |
| service_types     |                                                    |
| subnetpool_id     |                                                    |
| tags              |                                                    |
| tenant_id         | 969aabe81e0749e599daf64c874abbcb                   |
| updated_at        | 2020-06-28T12:49:41Z                               |
+-------------------+----------------------------------------------------+

(overcloud) [stack@undercloud-0 shift]$ nova list
+--------------------------------------+----------------------------+--------+------------+-------------+----------------------------------------+
| ID                                   | Name                       | Status | Task State | Power State | Networks                               |
+--------------------------------------+----------------------------+--------+------------+-------------+----------------------------------------+
| 5177a106-7a27-40be-b514-4d03a1b8acb3 | stackshift-2hpdv-bootstrap | ERROR  | -          | NOSTATE     |                                        |
| 8c094490-97f0-4a74-9b11-f750f00f19f9 | stackshift-2hpdv-master-0  | ACTIVE | -          | Running     | stackshift-2hpdv-openshift=10.0.2.84   |
| d385f306-26c4-475e-a985-df26c66be250 | stackshift-2hpdv-master-1  | ACTIVE | -          | Running     | stackshift-2hpdv-openshift=10.0.1.33   |
| d1d2abfc-1b2b-4c64-a6e8-8d83547fa252 | stackshift-2hpdv-master-2  | ACTIVE | -          | Running     | stackshift-2hpdv-openshift=10.0.1.210  |
| af81a3fc-5c2e-4c5a-9f2d-5631fdb94f2e | test                       | ACTIVE | -          | Running     | private-test=192.168.0.135, 10.0.0.123 |
+--------------------------------------+----------------------------+--------+------------+-------------+----------------------------------------+


(overcloud) [stack@undercloud-0 shift]$ nova show test
+--------------------------------------+----------------------------------------------------------+
| Property                             | Value                                                    |
+--------------------------------------+----------------------------------------------------------+
| OS-DCF:diskConfig                    | MANUAL                                                   |
| OS-EXT-AZ:availability_zone          | nova                                                     |
| OS-EXT-SRV-ATTR:host                 | compute-0.redhat.local                                   |
| OS-EXT-SRV-ATTR:hostname             | test                                                     |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | compute-0.redhat.local                                   |
| OS-EXT-SRV-ATTR:instance_name        | instance-0000002e                                        |
| OS-EXT-SRV-ATTR:kernel_id            |                                                          |
| OS-EXT-SRV-ATTR:launch_index         | 0                                                        |
| OS-EXT-SRV-ATTR:ramdisk_id           |                                                          |
| OS-EXT-SRV-ATTR:reservation_id       | r-eivlv900                                               |
| OS-EXT-SRV-ATTR:root_device_name     | /dev/vda                                                 |
| OS-EXT-SRV-ATTR:user_data            | -                                                        |
| OS-EXT-STS:power_state               | 1                                                        |
| OS-EXT-STS:task_state                | -                                                        |
| OS-EXT-STS:vm_state                  | active                                                   |
| OS-SRV-USG:launched_at               | 2020-06-28T12:53:54.000000                               |
| OS-SRV-USG:terminated_at             | -                                                        |
| accessIPv4                           |                                                          |
| accessIPv6                           |                                                          |
| config_drive                         |                                                          |
| created                              | 2020-06-28T12:53:48Z                                     |
| description                          | -                                                        |
| flavor:disk                          | 20                                                       |
| flavor:ephemeral                     | 0                                                        |
| flavor:extra_specs                   | {}                                                       |
| flavor:original_name                 | m1.shift                                                 |
| flavor:ram                           | 14336                                                    |
| flavor:swap                          | 0                                                        |
| flavor:vcpus                         | 8                                                        |
| hostId                               | bac915d6dcfb095a09fa9bcbe51ff919847064593dc9954d6797912f |
| host_status                          | UP                                                       |
| id                                   | af81a3fc-5c2e-4c5a-9f2d-5631fdb94f2e                     |
| image                                | rhel7 (516351e9-fc43-453e-9f2d-a3b94bbc3f97)             |
| key_name                             | stack                                                    |
| locked                               | False                                                    |
| locked_reason                        | -                                                        |
| metadata                             | {}                                                       |
| name                                 | test                                                     |
| os-extended-volumes:volumes_attached | []                                                       |
| private-test network                 | 192.168.0.135, 10.0.0.123                                |
| progress                             | 0                                                        |
| security_groups                      | default, icmpssh                                         |
| server_groups                        | []                                                       |
| status                               | ACTIVE                                                   |
| tags                                 | []                                                       |
| tenant_id                            | 969aabe81e0749e599daf64c874abbcb                         |
| trusted_image_certificates           | -                                                        |
| updated                              | 2020-06-28T12:53:54Z                                     |
| user_id                              | cf536a8e8c104f39899172f81dda6ee9                         |
+--------------------------------------+----------------------------------------------------------+

Comment 2 Andreas Karis 2020-06-28 13:25:33 UTC
sosreports when working:
/var/tmp/sosreport-compute-0-2020-06-28-tnmenfp.tar.xz
/var/tmp/sosreport-controller-0-2020-06-28-yfgitsg.tar.xz

Comment 3 Andreas Karis 2020-06-28 13:38:58 UTC
sosreports when not working:
/var/tmp/sosreport-compute-0-2020-06-28-zpkopir.tar.xz
/var/tmp/sosreport-controller-0-2020-06-28-gwrkivl.tar.xz

Comment 4 Andreas Karis 2020-06-28 13:43:09 UTC
I found thsi issue in a lab. I'm attaching this BZ to a customer case that I created for this purpose: 02689873
I'm just using the customer case as a datadump for the data ; use support-shell to retrieve it.

Thanks

Comment 5 Andreas Karis 2020-06-28 14:06:26 UTC
So the workaround for me is:
~~~
(undercloud) [stack@undercloud-0 ~]$ cat virt/disable-dvr.yaml 
parameter_defaults:
  NeutronEnableDVR: false
~~~

Comment 6 Roman Safronov 2020-06-29 09:40:47 UTC
Seems like same issue as https://bugzilla.redhat.com/show_bug.cgi?id=1837558

was fixed in ovn2.13-2.13.0-33.el7fdn (will be available in new FDP release, 20.E), see https://bugzilla.redhat.com/show_bug.cgi?id=1836976

Comment 7 Roman Safronov 2020-06-29 09:44:45 UTC
sorry, for osp16 fixed ovn version should be ovn2.11-2.11.1-47.el8fdp

Comment 8 Jakub Libosvar 2020-06-30 11:34:58 UTC
Roman is right, this is a bug in OVN tracked by bug 1837558

*** This bug has been marked as a duplicate of bug 1837558 ***