Bug 1854376 - ICMPv6 packets are going through the wrong path in br-int
Summary: ICMPv6 packets are going through the wrong path in br-int
Keywords:
Status: MODIFIED
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openvswitch
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Adrián Moreno
QA Contact: Eran Kuris
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-07 10:34 UTC by Slawek Kaplonski
Modified: 2020-09-16 10:05 UTC (History)
13 users (show)

Fixed In Version: openvswitch2.13-2.13.0-49.el8fdp
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1862153 (view as bug list)
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)

Description Slawek Kaplonski 2020-07-07 10:34:19 UTC
Description of problem:
When we run tempest tests like tempest.scenario.test_network_v6 those tests are failing very often because VM can't ping IPv6 gateway IP address which is configured on the router.
It happens always with DVR routers and ML2/OVS backend.

After investigation I found out that the problem is only for about 4-5 minutes after VM is spawned. Later everything starts working fine.

Lets describe issue based on my example.

VM port: 3179cf76-d61a-4443-917d-d0a4fecb95ee
MAC address: fa:16:3e:42:44:c2

Router: f9c07645-3c92-43e2-a950-e148ebb5e432
Router's port: e731be94-3a56-433e-a3f1-6275ccba5b67
MAC address: fa:16:3e:a7:3c:e2

Here it is like it looks like on the compute node:
- br-int http://pastebin.test.redhat.com/881998
- ovs-ofctl show br-int: http://pastebin.test.redhat.com/882000
- ovs-ofctl dump-flows br-int: http://pastebin.test.redhat.com/882002

VM is trying to ping IPv6 gateway from the DVR router, so packet should go like

vm -> tap3179cf76-d6-> [linux bridge] -> qvo3179cf76-d6 (br-int) -> qr-e731be94-3a (br-int)

But here it is like it looks like with tcpdump when it's not working:

- tcpdump on tap3179cf76-d6: http://pastebin.test.redhat.com/882005
- tcpdump on qr-e731be94-3a: http://pastebin.test.redhat.com/882007

As You can see ICMP reply is visible on qr- interface but not on the tap. Both interfaces are in the br-int bridge on the same compute node.

And I can see those ICMP replies on the other compute node, on the interface which is used to do vxlan tunnels: http://pastebin.test.redhat.com/882010

I checked with ofproto/trace how such packet should go with OF rules: http://pastebin.test.redhat.com/882011 and here is how fdb entries looked like on the br-int in that moment: http://pastebin.test.redhat.com/882012

ICMPv4 seems to be working fine in the same time when this ICMPv6 is not working properly.



Version-Release number of selected component (if applicable):

[root@compute-1 heat-admin]# rpm -qa | grep openvswitch
openvswitch-selinux-extra-policy-1.0-22.el8fdp.noarch
rhosp-openvswitch-2.13-8.el8ost.noarch
openvswitch2.13-2.13.0-25.el8fdp.1.x86_64
network-scripts-openvswitch2.13-2.13.0-25.el8fdp.1.x86_64


How reproducible:

Almost always when You run tempest tests from tempest.scenario.test_network_v6 module on DVR environment. I tried it on 3controllers + 2 computes.

Comment 3 Timothy Redaelli 2020-07-24 19:57:23 UTC
After some regression tests I found that this commit:

commit dbf4a92800d0365cc3ec3c0e99df56e2ba676cb7
Author: Eli Britstein <elibr@mellanox.com>
Date:   Thu Mar 21 07:44:16 2019 +0000

    odp-util: Do not rewrite fields with the same values as matched
    
    To improve performance and avoid wasting resources for HW offloaded
    flows, do not rewrite fields that are matched with the same value.
    
    Reviewed-by: Roi Dayan <roid@mellanox.com>
    Signed-off-by: Eli Britstein <elibr@mellanox.com>
    Signed-off-by: Ben Pfaff <blp@ovn.org>

introduced the issue.

We still need to figure out if it's a real bug introduced by the commit or if it's only necessary to change some flows

Comment 4 Ilya Maximets 2020-07-27 16:05:05 UTC
Adrian, Timothy, could you try this:
https://patchwork.ozlabs.org/project/openvswitch/list/?series=192595

I don't know if it will fix the issue, but it seems relevant.

Comment 5 Adrián Moreno 2020-07-27 16:16:51 UTC
After Timothy kindly bisected the problem I did some experimentation and what I found is:

- At the ofproto level, there is no explicit flow for traffic coming from the qrouter port in br-int so it falls back to ACTION_NORMAL
VM interface:
 27(qvo5b0e0123-0a): addr:fe:19:7e:45:b4:87
QRouter Interface:   
 28(qr-2ba2a180-bc): addr:fa:16:3e:bd:9a:76


   * On a working version this gets translated into:
(
  port 10: qvo5b0e0123-0a
  port 11: vxlan_sys_4789 (vxlan: packet_type=ptap)
  port 12: qr-2ba2a180-bc (internal)
)

recirc_id(0),in_port(12),eth(src=fa:16:3e:bd:9a:76,dst=fa:16:3e:5c:47:bb),eth_type(0x86dd),ipv6(frag=no), packets:1, bytes:118, used:3.485s, actions:10
   
   * However, on a non-working version (containing the patch pointed out by Timothy) the datapath rule is:
   
recirc_id(0),in_port(12),skb_mark(0),eth(src=fa:16:3e:bd:9a:76),eth_type(0x86dd),ipv6(tclass=0/0x3,frag=no), packets:268, bytes:28992, used:1.297s, actions:set(tunnel(tun_id=0x1,src=172.17.2.107,dst=172.17.2.59,ttl=64,tp_dst=4789,flags(df|key))),set(eth(src=fa:16:3f:41:1d:2f)),11

   
- According to the flow above, all ipv6 (not just icmpv6) traffic coming from the qrouter gets sent through the vxlan interface. As long as there is traffic matching that flow it stays configured in the datapath and the problem persists.

- Capturing in the input port (qr-2ba2a180-bc):
16:02:55.478056 fa:16:3e:5c:47:bb > fa:16:3e:bd:9a:76, ethertype 802.1Q (0x8100), length 122: vlan 10, p 0, ethertype IPv6, 2001:db8::f816:3eff:fe5c:47bb > 2001:db8::1: ICMP6, echo request, seq 1, length 64
16:02:55.478101 fa:16:3e:bd:9a:76 > fa:16:3e:5c:47:bb, ethertype 802.1Q (0x8100), length 122: vlan 10, p 0, ethertype IPv6, 2001:db8::1 > 2001:db8::f816:3eff:fe5c:47bb: ICMP6, echo reply, seq 1, length 64

- During this situation, ofproto/trace reports the expected behavior (not the observed one):

$ ovs-appctl ofproto/trace br-int in_port=qr-2ba2a180-bc,dl_src=fa:16:3e:bd:9a:76,dl_dst=fa:16:3e:5c:47:bb,dl_vlan=10,icmp6
Flow: icmp6,in_port=28,dl_vlan=10,dl_vlan_pcp=0,vlan_tci1=0x0000,dl_src=fa:16:3e:bd:9a:76,dl_dst=fa:16:3e:5c:47:bb,ipv6_src=::,ipv6_dst=::,ipv6_label=0x00000,nw_tos=0,nw_ecn=0,nw_ttl=0,icmp_type=0,icmp_code=0

bridge("br-int")
----------------
 0. priority 0, cookie 0xf522faab0d3e9672
    goto_table:60
60. dl_vlan=10,dl_dst=fa:16:3e:5c:47:bb, priority 20, cookie 0xf522faab0d3e9672
    pop_vlan
    output:27

Final flow: icmp6,in_port=28,vlan_tci=0x0000,dl_src=fa:16:3e:bd:9a:76,dl_dst=fa:16:3e:5c:47:bb,ipv6_src=::,ipv6_dst=::,ipv6_label=0x00000,nw_tos=0,nw_ecn=0,nw_ttl=0,icmp_type=0,icmp_code=0
Megaflow: recirc_id=0,eth,ipv6,in_port=28,dl_vlan=10,dl_vlan_pcp=0,dl_dst=fa:16:3e:5c:47:bb,nw_frag=no
Datapath actions: pop_vlan,10

- If "ovs-appctl dpctl/del-flows" is run in the affected node to trigger a re-translation of the flows, the correct flow will be configured:


- Also, if a 5~10 delay is introduced before running the ping6 test, the situation won't happen

I am not familiar with the code modified by the patch but from what I've seen it seems that there is an initial ipv6 flow that then get's modified but dl_dst fails to get added to the mask. I'm fine tuning the the testing scripts in order to catch those initial seconds that seem crucial.

Comment 6 Adrián Moreno 2020-07-27 16:59:28 UTC
(In reply to Ilya Maximets from comment #4)
> Adrian, Timothy, could you try this:
> https://patchwork.ozlabs.org/project/openvswitch/list/?series=192595
> 
> I don't know if it will fix the issue, but it seems relevant.

It does!

Thanks Ilya.

Comment 7 Adrián Moreno 2020-07-28 07:26:29 UTC
For the sake of completeness, although Ilya's commit message explains the issue perfectly (https://patchwork.ozlabs.org/project/openvswitch/patch/20200727185848.4089473-1-i.maximets@ovn.org/), the flow that gets configured during the first seconds of VM boot time is:

recirc_id(0),in_port(12),skb_mark(0),eth(src=fa:16:3e:bd:9a:76),eth_type(0x86dd),ipv6(tclass=0/0x3,frag=no), packets:8, bytes:776, used:7.574s, actions:push_vlan(vid=10,pcp=0),2,set(tunnel(tun_id=0x1,src=172.17.2.107,dst=172.17.2.59,ttl=64,tp_dst=4789,flags(df|key))),set(eth(src=fa:16:3f:41:1d:2f)),pop_vlan,11,set(tunnel(tun_id=0x1,src=172.17.2.107,dst=172.17.2.139,ttl=64,tp_dst=4789,flags(df|key))),11,set(tunnel(tun_id=0x1,src=172.17.2.107,dst=172.17.2.106,ttl=64,tp_dst=4789,flags(df|key))),11,set(tunnel(tun_id=0x1,src=172.17.2.107,dst=172.17.2.27,ttl=64,tp_dst=4789,flags(df|key))),11,set(eth(src=fa:16:3e:bd:9a:76)),10,13

Without Ilya's patch, this gets modified into:

recirc_id(0),in_port(12),skb_mark(0),eth(src=fa:16:3e:bd:9a:76),eth_type(0x86dd),ipv6(tclass=0/0x3,frag=no), packets:0, bytes:0, used:never, actions:set(tunnel(tun_id=0x1,src=172.17.2.107,dst=172.17.2.59,ttl=64,tp_dst=4789,flags(df|key))),set(eth(src=fa:16:3f:41:1d:2f)),11

If you wait long enough until the first rule disappears from the datapath, a correct one is inserted instead. This confirms the issue is related with flow modification as Ilya explains.


Note You need to log in before you can comment on or make changes to this bug.