Bug 1897103 - [OSP16.1][OVN] Nova metadata service is unreachable in non DVR telco OVN+DPDK deployment
Summary: [OSP16.1][OVN] Nova metadata service is unreachable in non DVR telco OVN+DPDK...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-networking-ovn
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z4
: 16.1 (Train on RHEL 8.2)
Assignee: Rodolfo Alonso
QA Contact: Vadim Khitrin
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-11-12 10:25 UTC by Vadim Khitrin
Modified: 2021-03-17 15:36 UTC (History)
19 users (show)

Fixed In Version: python-networking-ovn-7.3.1-1.20201114024052.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-03-17 15:35:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1904871 0 None None None 2020-11-20 11:45:16 UTC
OpenStack gerrit 763745 0 None MERGED [OVN] Ensure metadata checksum 2021-02-19 09:04:39 UTC
OpenStack gerrit 772286 0 None MERGED [ovn] Metadata agent: fix checking datapath type 2021-02-19 09:04:39 UTC
Red Hat Product Errata RHBA-2021:0817 0 None None None 2021-03-17 15:36:03 UTC

Description Vadim Khitrin 2020-11-12 10:25:16 UTC
Description of problem:
After deploying a telco OVN-DPDK + SR-IOV deployment with DVR disabled, it appears that the guest instances are unable to download metadata fron nova metadata API service.

All networking-ovn-metadata-agent agents are running:
+--------------------------------------+------------------------------+-----------------------------------+-------------------+-------+-------+-------------------------------+
| ID                                   | Agent Type                   | Host                              | Availability Zone | Alive | State | Binary                        |
+--------------------------------------+------------------------------+-----------------------------------+-------------------+-------+-------+-------------------------------+
| 71723433-d4a9-4ee9-9d9c-3e846544e603 | NIC Switch agent             | computeovndpdksriov-1.localdomain | None              | :-)   | UP    | neutron-sriov-nic-agent       |
| e9f36491-bca9-4c3e-8be6-74793777fa29 | NIC Switch agent             | computeovndpdksriov-0.localdomain | None              | :-)   | UP    | neutron-sriov-nic-agent       |
| a15f7f3e-f8c2-408a-a9dd-68570f9797e4 | OVN Controller agent         | computeovndpdksriov-1.localdomain |                   | :-)   | UP    | ovn-controller                |
| 9e1a5e19-00fb-4229-b002-7832f90ababf | OVN Metadata agent           | computeovndpdksriov-1.localdomain |                   | :-)   | UP    | networking-ovn-metadata-agent |
| 6e4b94b6-70ce-44cf-ac0c-0869f9e146ac | OVN Controller agent         | computeovndpdksriov-0.localdomain |                   | :-)   | UP    | ovn-controller                |
| f252611c-f8c0-424e-adbc-916a7afd4fad | OVN Metadata agent           | computeovndpdksriov-0.localdomain |                   | :-)   | UP    | networking-ovn-metadata-agent |
| f4d136e4-cc7d-4941-9e6d-661c59808a81 | OVN Controller Gateway agent | controller-1.localdomain          |                   | :-)   | UP    | ovn-controller                |
| e6a989da-ed22-4246-949c-138252f33db7 | OVN Metadata agent           | controller-1.localdomain          |                   | :-)   | UP    | networking-ovn-metadata-agent |
| 54a453c8-0674-4b72-978a-eb9740d1a9c7 | OVN Controller Gateway agent | controller-0.localdomain          |                   | :-)   | UP    | ovn-controller                |
| 59417f27-d944-4220-8248-6b40862660d5 | OVN Metadata agent           | controller-0.localdomain          |                   | :-)   | UP    | networking-ovn-metadata-agent |
| d8eec782-46db-40ca-9646-ac3876543254 | OVN Controller Gateway agent | controller-2.localdomain          |                   | :-)   | UP    | ovn-controller                |
| 2d5695fd-7ed6-4041-80e2-c68ec4916e54 | OVN Metadata agent           | controller-2.localdomain          |                   | :-)   | UP    | networking-ovn-metadata-agent |
+--------------------------------------+------------------------------+-----------------------------------+-------------------+-------+-------+-------------------------------+

Inside the guest, the routing appears to be correct:
[root@localhost ~]# ip r
default via 20.10.114.254 dev eth0 proto dhcp metric 100
20.10.114.0/24 dev eth0 proto kernel scope link src 20.10.114.199 metric 100
169.254.169.254 via 20.10.114.100 dev eth0 proto dhcp metric 100

And there is network connectivity to the metadata API service:
[root@localhost ~]# ping -c 1 169.254.169.254
PING 169.254.169.254 (169.254.169.254) 56(84) bytes of data.
64 bytes from 169.254.169.254: icmp_seq=1 ttl=64 time=0.845 ms

--- 169.254.169.254 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.845/0.845/0.845/0.000 ms

But we're unable to download data from the service:
[root@localhost ~]# curl http://169.254.169.254
curl: (7) Failed connect to 169.254.169.254:80; Connection timed out

On the hypervisor we can see the created namespace:
[root@computeovndpdksriov-1 ~]# ip netns
ovnmeta-5737c80f-2b2f-4614-b6b2-94b494001256 (id: 0)

We can view the tap interface:
[root@computeovndpdksriov-1 ~]# ip netns exec ovnmeta-5737c80f-2b2f-4614-b6b2-94b494001256 ip r
20.10.114.0/24 dev tap5737c80f-21 proto kernel scope link src 20.10.114.100
169.254.0.0/16 dev tap5737c80f-21 proto kernel scope link src 169.254.169.254

The required tap interface is attached to br-int (dev tap5737c80f-20):
[root@computeovndpdksriov-1 ~]# ovs-vsctl show | less
    Bridge br-int
        fail_mode: secure
        datapath_type: netdev
        Port br-int
            Interface br-int
                type: internal
        Port tap5737c80f-20
            Interface tap5737c80f-20
        Port vhu17bd1131-4a
            Interface vhu17bd1131-4a
                type: dpdkvhostuserclient
                options: {vhost-server-path="/var/lib/vhost_sockets/vhu17bd1131-4a"}
        Port ovn-54a453-0
            Interface ovn-54a453-0
                type: geneve
                options: {csum="true", key=flow, remote_ip="10.10.111.180"}
                bfd_status: {diagnostic="No Diagnostic", flap_count="1", forwarding="true", remote_diagnostic="No Diagnostic", remote_state=up, state=up}
        Port ovn-d8eec7-0
            Interface ovn-d8eec7-0
                type: geneve
                options: {csum="true", key=flow, remote_ip="10.10.111.138"}
                bfd_status: {diagnostic="No Diagnostic", flap_count="1", forwarding="true", remote_diagnostic="No Diagnostic", remote_state=up, state=up}
        Port ovn-f4d136-0
            Interface ovn-f4d136-0
                type: geneve
                options: {csum="true", key=flow, remote_ip="10.10.111.194"}
                bfd_status: {diagnostic="No Diagnostic", flap_count="1", forwarding="true", remote_diagnostic="No Diagnostic", remote_state=up, state=up}
        Port patch-br-int-to-provnet-0c21d1e2-0d15-4467-bbcf-8d7d9c13549f
            Interface patch-br-int-to-provnet-0c21d1e2-0d15-4467-bbcf-8d7d9c13549f
                type: patch
                options: {peer=patch-provnet-0c21d1e2-0d15-4467-bbcf-8d7d9c13549f-to-br-int}
        Port ovn-6e4b94-0
            Interface ovn-6e4b94-0
                type: geneve
                options: {csum="true", key=flow, remote_ip="10.10.111.197"}

There doesn't appear to be any errors in /var/log/containers/neutron/ovn-metadata-agent.log logs, I'm unsure on how to proceed with further debugging of this issue.

Version-Release number of selected component (if applicable):
RHOS-16.1-RHEL-8-20201021.n.0

How reproducible:
Always

Steps to Reproduce:
1. Deploy OVN-DPDK + SR-IOV setup with DVR disabled
2. Spawn guest instances
3. Attempt to access nova metadata service via network

Actual results:
Nova metadata service is unreachable

Expected results:
Nova metadata service is reachable

Additional info:
Will attach deployment templates and SOS report

Comment 2 Vadim Khitrin 2020-11-16 00:13:32 UTC
Forgot to mention that our neutron networks are set with MTU of 9000.

Together with Eran we've tried several things with no success:
1) Set ovn_emit_need_to_frag=True in ml2 configuration.
2) Attempt to update kernel of overcloud nodes based on BZ#1854084
3) Lower MTU of neutron networks to 1500

Comment 20 Franck Baudin 2021-01-06 17:24:00 UTC
Based on https://bugzilla.redhat.com/show_bug.cgi?id=1876459 the bug is fixed in OVS and this limitation will disappear as soon as backported in OVS 2.13, right?

Comment 21 Rodolfo Alonso 2021-01-07 15:37:56 UTC
Hi Franck:

BZ#1876459 is fixing the connection tracking for OVS-DPDK.

But the problem in this BZ is not the ct but the checksum between a kernel port and a DPDK port. In this BZ, the TCP traffic from the kernel namespace (OVN metadata) to the OVS-DPDK is being dropped because of the wrong checksum calculation. This is fixed with an iptables mangle rule. We don't use ct for the metadata traffic.

Regards.

Comment 60 errata-xmlrpc 2021-03-17 15:35:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.4 director bug fix advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0817


Note You need to log in before you can comment on or make changes to this bug.