Bug 2017424

Summary:	VM with port connected to external network does not reach the ovnmetadata namespace
Product:	Red Hat OpenStack	Reporter:	Eduardo Olivares <eolivare>
Component:	rhosp-openvswitch	Assignee:	OSP Team <rhos-maint>
Status:	CLOSED DUPLICATE	QA Contact:	OSP Team <rhos-maint>
Severity:	high	Docs Contact:
Priority:	urgent
Version:	16.2 (Train)	CC:	apevec, ctrautma, egarciar, ekuris, ihrachys, jiji, kfida, lhh, majopela, mburns, nusiddiq, ralonsoh, rsafrono, scohen
Target Milestone:	z7	Keywords:	AutomationBlocker
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	2018365 2018980 (view as bug list)		Environment:
Last Closed:	2021-11-10 14:19:41 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	2018365, 2018980

Description Eduardo Olivares 2021-10-26 13:50:03 UTC

Description of problem:
This issue is reproduced on a RHOS 16.2 environment, using ovn-2021-21.09.0-12.

There is a problem with the VMs connected directly to the external/provider network. They have no connectivity with the metadata service. Actually, I try to ping from the VM instance the IP of the metadata namespace, I use tcpdump to capture traffic in the metadata namespace, and I capture no packets at all.

This issue is not a race condition, it happens always with ovn-2021-21.09.0-12 + external network with vlan-transparency=True.
It doesn't occur with instances connected to a tenant network with ovn-2021-21.09.0-12.
It doesn't occur with instances connected to the external network with ovn-2021-21.09.0-12 when the external network has vlan-transparency=False.
It doesn't occur with ovn-2021-21.06.0-29, the latest OVN version officially included in RHOS 16.2.



# VM INSTANCE RUNNING ON COMPUTE-1 AND CONNECTED TO THE EXTERNAL NETWORK
[root@localhost ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether fa:16:3e:32:c9:c7 brd ff:ff:ff:ff:ff:ff
    inet 10.218.0.188/24 brd 10.218.0.255 scope global dynamic noprefixroute eth0
       valid_lft 37959sec preferred_lft 37959sec
    inet6 fe80::f816:3eff:fe32:c9c7/64 scope link
       valid_lft forever preferred_lft forever
[root@localhost ~]# ip r
default via 10.218.0.10 dev eth0 proto dhcp metric 100
10.218.0.0/24 dev eth0 proto kernel scope link src 10.218.0.188 metric 100
169.254.169.254 via 10.218.0.160 dev eth0 proto dhcp metric 100
[root@localhost ~]# ping 10.218.0.160 -c1
PING 10.218.0.160 (10.218.0.160) 56(84) bytes of data.
From 10.218.0.188 icmp_seq=1 Destination Host Unreachable
 
--- 10.218.0.160 ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms
 
 
 
# METADATA NAMESPACE ON THE SAME COMPUTE-1 (NO PACKETS CAPTURED / EXPECTED ONE ICMP PACKET)
[root@compute-1 ~]# ip netns e ovnmeta-50aaa274-19b2-4d99-93fd-84843917fd27 ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: tap50aaa274-11@if279: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether fa:16:3e:3f:0b:3d brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.218.0.160/24 brd 10.218.0.255 scope global tap50aaa274-11
       valid_lft forever preferred_lft forever
    inet 169.254.169.254/16 brd 169.254.255.255 scope global tap50aaa274-11
       valid_lft forever preferred_lft forever
 
[root@compute-1 ~]# ip netns e ovnmeta-50aaa274-19b2-4d99-93fd-84843917fd27 tcpdump -vne -i tap50aaa274-11
dropped privs to tcpdump
tcpdump: listening on tap50aaa274-11, link-type EN10MB (Ethernet), capture size 262144 bytes
^C
0 packets captured
0 packets received by filter





Version-Release number of selected component (if applicable):
ovn-2021-21.09.0-12

How reproducible:
100%

Steps to Reproduce:
1. create external network with vlan-transparency
2. create a VM with a port connected to the external network
3. check the VM could not connect to the metadata service during the cloud-init stage

Comment 2 Ihar Hrachyshka 2021-10-28 22:33:48 UTC

Several clarifications to the original description after spending quality time on the cluster w/ Numan (thanks!):

1) This is not specific to ovnmeta- localport port; the whole network is affected, where broadcast traffic is not delivered to any ports (the same issue affects flows for communication between two VIF ports);

2) This affects all broadcast flows, not just ARP; we noticed the bug by the virtue of VLAN transparent networks disabling local ARP responder, forcing OVN to send broadcast ARP requests originating from one of switch ports to all other ports; broadcast broken in transparent networks effectively means no IP-to-MAC resolution working, which is a lot more visible than for usual networks;

3) We noticed that the flow in table=37 that is meant to fan out the same broadcast frame to all switch ports (including VIFs and localport ovnmeta- port) does resubmit() the frame to some ports but not others;

4) We noticed that the flow in table=37 sends the frame to the first port, which is a router port, but not the rest; example of the fanout flow:

 cookie=0xd28ed8be, duration=40941.405s, table=37, n_packets=26458, n_bytes=1259820, idle_age=0, hard_age=40376, priority=100,reg15=0x8000,metadata=0x1 actions=load:0x9->NXM_NX_REG15[],
resubmit(,39),load:0x5->NXM_NX_REG15[],resubmit(,39),load:0x2->NXM_NX_REG15[],resubmit(,39),load:0x3->NXM_NX_REG15[],resubmit(,39),load:0x6->NXM_NX_REG15[],resubmit(,39),load:0x8000->NXM_NX_REG15[],resubmit(,38)

5) The first fanout of the frame to route port is processed through the tables, reaching table=65, where it's redirected to router datapath and resubmitted into table=8 again to run through router pipeline; it's then blocked there;

6) one would expect that once router copy of the frame is handled, OVN will return to the original action list in table=37 and continue to deliver to the rest of ports, but it doesn't; we can see it in counters in table=39 updating by 1, not by the number of ports in the table=37 action list;

7) we had a hypothesis that something in router pipeline breaks in such a way that the whole pipeline is short circuited;

8) one change in 21.09 that is related to router pipeline and that we were aware of was the PMTU discovery enforcement, enabled by gateway_mtu option set on router port;

9) when we manually unset the option for the router port, the pipeline reverted to fanning out broadcast traffic to all switch ports.

We suspect that there is a bug in PMTU enforcement mechanism that circumvents execution of the whole actions list from table=37, and that the bug may be visible when router port is not processed the last in the action list. Numan suggests the culprit is check_pkt_larger implementation.

That said, we were not able to reproduce the bug on a new network that seems identical to the one affected; and attaching a router port to the network didn't help in reproducing the issue either, (despite that the router port was inserted first in the list of fanout actions in table=37 flow). So we are actually not sure about the exact mechanism how PMTU affects the environment, we just know that unsetting gateway_mtu helps with processing the complete actions list.

Also note that another PMTU related bug was reported for the same environment that affects N-S direction (not for broadcast): https://bugzilla.redhat.com/show_bug.cgi?id=2018179

AFAIU check_pkt_larger is part of OVS proper, not OVN, and we will need to clone the bug to get a fix there. I am not sure if anything can be optimized / fixed on OVN side though. Numan also suggested there may be some inefficiencies in how and when OpenStack OVN driver sets gateway_mtu (setting it when it's not required, exacerbating problems with the OVS packet length check.) If that's the case, we may also want to clone the bug to openstack-neutron.

Side note: This is the 4th bug I am aware that was revealed (but not necessarily triggered) by disabling ARP responder for VLAN transparent networks. Looks like people don't really use broadcast for anything but ARP resolution, and this is generally localized in OVN and not offloaded to port owners...

Comment 3 Numan Siddique 2021-10-28 22:39:44 UTC

In IMO to unblock this we can handle in neutron too.  Neutron ml2 ovn driver should set gateway_mtu option only if the neutron geneve networks mtu is greater (or perhaps even lesser) than the provider network mtu.  I think this would also help performance wise too.  As there will be no need to check the pkt length
and check_pkt_len datapath action can't be offloaded.

Comment 5 Ihar Hrachyshka 2021-10-28 23:00:39 UTC

Attached OVN dbs since Numan confirmed they can be used to reproduce the issue (note the "nova" switch).

Here is the output that shows what happens in router port pipeline:

[root@ovn-chassis-1 data]# ovs-dpctl dump-flows | grep "in_port(9"
[18:50:13]
recirc_id(0),in_port(9),ct_state(-new-est-rel-rpl-inv-trk),ct_label(0/0x1),eth(src=fa:16:3e:ae:a5:59,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),arp(sip=10.218.0.227,tip=10.218.0.188,op=1/0xff,sha=fa:16:3e:ae:a5:59), packets:93, bytes:3906, used:0.997s, actions:check_pkt_len(size=1518,gt(drop),le(drop))

Comment 6 Ihar Hrachyshka 2021-10-29 00:39:10 UTC

Since this affects gateway_mtu handling, I think a neutron side workaround could be setting ovn_emit_need_to_frag to False. (Achieved by removing OVNEmitNeedToFrag: true from the job definition.) That said, this should really be fixed on OVS side. Numan suggested this: https://paste.centos.org/view/7db11803

Comment 8 Ihar Hrachyshka 2021-10-29 14:00:58 UTC

@Eduardo, don't get me wrong, I don't suggest to disable the option, esp. not permanently. Just threw an idea here in case it's important to make this bug go away temporarily in an environment.

Thanks for the OSP bug. Do we keep the OVN one?

Comment 9 Eduardo Olivares 2021-10-29 14:04:45 UTC

(In reply to Ihar Hrachyshka from comment #8)
> @Eduardo, don't get me wrong, I don't suggest to disable the option, esp.
> not permanently. Just threw an idea here in case it's important to make this
> bug go away temporarily in an environment.
Ack. Thanks for the clarification!

> 
> Thanks for the OSP bug. Do we keep the OVN one?
If the issue is going to be resolved on either OVS or OSP (or both), I guess the OVN bug can be closed.

Comment 10 Eran Kuris 2021-10-31 07:12:49 UTC

I think we can move the component and product of this issue instead of closing it.

Comment 12 Elvira 2021-11-10 14:19:41 UTC


*** This bug has been marked as a duplicate of bug 2018459 ***