Bug 1967367 - ovs requests icmpv6 csum update inadvertently, causes flow to not offload [NEEDINFO]
Summary: ovs requests icmpv6 csum update inadvertently, causes flow to not offload
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: openvswitch2.15
Version: FDP 21.D
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Aaron Conole
QA Contact: Jianlin Shi
URL:
Whiteboard:
Depends On:
Blocks: 2172625
TreeView+ depends on / blocked
 
Reported: 2021-06-03 02:06 UTC by Jianlin Shi
Modified: 2023-08-02 13:44 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-06-16 10:01:22 UTC
Target Upstream Version:
Embargoed:
aconole: needinfo? (dceara)


Attachments (Terms of Use)
flows for br-int (156.61 KB, text/plain)
2021-06-03 02:08 UTC, Jianlin Shi
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FD-1342 0 None None None 2021-10-15 19:31:13 UTC

Description Jianlin Shi 2021-06-03 02:06:19 UTC
Description of problem:
ipv6 between 2 vfs for mlx5_core in ovn setup is not offloaded
ipv4 between the same vfs is offloaded

Version-Release number of selected component (if applicable):
openvswitch2.15-2.15.0-23.el8fdp.x86_64

How reproducible:
Always

Steps to Reproduce:
1. setup vf:
echo 4 > /sys/bus/pci/devices/0000:3b:00.0/sriov_numvfs
echo 0000:3b:00.2 > /sys/bus/pci/drivers/mlx5_core/unbind 
echo 0000:3b:00.3 > /sys/bus/pci/drivers/mlx5_core/unbind
echo 0000:3b:00.4 > /sys/bus/pci/drivers/mlx5_core/unbind 
echo 0000:3b:00.5 > /sys/bus/pci/drivers/mlx5_core/unbind
devlink dev eswitch set pci/0000:3b:00.0 mode switchdev

2. create guest and attach representer into guest:
virt-install --name g0 --vcpus=2 --ram=2048 --disk path=/var/lib/libvirt/images/g0.qcow2,device=disk,bus=virtio,format=qcow2 --network bridge=virbr0,model=virtio --boot hd --accelerate --force --graphic
s none --noautoconsole
virt-install --name g2 --vcpus=2 --ram=2048 --disk path=/var/lib/libvirt/images/g2.qcow2,device=disk,bus=virtio,format=qcow2 --network bridge=virbr0,model=virtio --boot hd --accelerate --force --graphic
s none --noautoconsole
cat vf.xml
<interface type='hostdev' managed='yes'>
<source>
<address type='pci' domain='0x0000' bus='0x3b' slot='0x00' function='0x2'/>
</source>
<mac address='00:00:00:01:01:13'/>
</interface>
virsh attach-device g0 vf.xml
cat vf.xml
<interface type='hostdev' managed='yes'>
<source>
<address type='pci' domain='0x0000' bus='0x3b' slot='0x00' function='0x4'/>
</source>
<mac address='00:00:00:01:02:13'/>
</interface>
virsh attach-device g2 vf.xml

3. add representer into ovn
ip link set eth0 down
ip link set eth0 name s_pf0vf0
ovs-vsctl add-port br-int s_pf0vf0 -- set interface s_pf0vf0 external_ids:iface-id=s_pf0vf0
ip link set s_pf0vf0 up
ip link set eth2 down
ip link set eth2 name s_pf0vf2
ovs-vsctl add-port br-int s_pf0vf2 -- set interface s_pf0vf2 external_ids:iface-id=s_pf0vf2
ip link set s_pf0vf2 up

ovn-nbctl ls-add ls1                                                                                                                                                                                
        
ovn-nbctl ls-add ls2
        
ovn-nbctl lr-add lr1
ovn-nbctl lrp-add lr1 lr1-ls1 00:00:00:00:00:01 192.168.1.254/24 2001::a/64
ovn-nbctl lsp-add ls1 ls1-lr1
ovn-nbctl lsp-set-type ls1-lr1 router
ovn-nbctl lsp-set-options ls1-lr1 router-port=lr1-ls1
ovn-nbctl lsp-set-addresses ls1-lr1 router
        
ovn-nbctl lrp-add lr1 lr1-ls2 00:00:00:00:00:02 172.17.$ip_subnet.254/24 7777:$ip_subnet::a/64
ovn-nbctl lsp-add ls2 ls2-lr1
ovn-nbctl lsp-set-type ls2-lr1 router
ovn-nbctl lsp-set-options ls2-lr1 router-port=lr1-ls2
ovn-nbctl lsp-set-addresses ls2-lr1 router

ovn-nbctl lsp-add ls1 s_pf0vf0
ovn-nbctl lsp-set-addresses s_pf0vf0 "00:00:00:01:01:11 192.168.1.11 2001::11"
ovn-nbctl lsp-add ls2 s_pf0vf2
ovn-nbctl lsp-set-addresses s_pf0vf2 "00:00:00:01:02:11 172.17.174.11 7777:174::11"

4. enable hw offload for ovs:
ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
systemctl restart openvswitch

5. ping6 7777:174::11 on g0.
Actual results:
ipv6 packets can be captured on representer for s_pf0vf2: not hw offloaded

Expected results:
ipv6 should not be captured on representer for s_pf0vf2: hw offloaded

Additional info:

[root@wsfd-advnetlab18 ~]# ovn-nbctl show
switch c36f3372-3e51-4c9a-a706-9bc447c3c913 (ls1)
    port s_pf1vf1
        addresses: ["00:00:00:01:01:16 192.168.1.16 2001::16"]
    port c_pf0vf1
        addresses: ["00:00:00:01:01:14 192.168.1.14 2001::14"]
    port c_pf0vf0
        addresses: ["00:00:00:01:01:13 192.168.1.13 2001::13"]
    port s_pf1vf0
        addresses: ["00:00:00:01:01:15 192.168.1.15 2001::15"]
    port ls1-lr1
        type: router
        router-port: lr1-ls1
    port s_pf0vf0
        addresses: ["00:00:00:01:01:11 192.168.1.11 2001::11"]
    port s_pf0vf1
        addresses: ["00:00:00:01:01:12 192.168.1.12 2001::12"]
switch 5dc10d4b-56d8-47a1-84d4-bf82599c078b (ls2)
    port c_pf0vf2
        addresses: ["00:00:00:01:02:13 172.17.174.13 7777:174::13"]
    port ls2-lr1
        type: router
        router-port: lr1-ls2
    port s_pf0vf3
        addresses: ["00:00:00:01:02:12 172.17.174.12 7777:174::12"]
    port c_pf0vf3
        addresses: ["00:00:00:01:02:14 172.17.174.14 7777:174::14"]
    port s_pf0vf2
        addresses: ["00:00:00:01:02:11 172.17.174.11 7777:174::11"]
router f8f6b318-42d0-45b9-8aad-f29c3cab59cf (lr1)
    port lr1-ls1
        mac: "00:00:00:00:00:01"
        networks: ["192.168.1.254/24", "2001::a/64"]
    port lr1-ls2
        mac: "00:00:00:00:00:02"
        networks: ["172.17.174.254/24", "7777:174::a/64"]

[root@wsfd-advnetlab18 ~]# ovs-appctl dpctl/dump-flows -m --names  
ufid:005fab43-9392-4e7b-b4cd-049d0cf8a46b, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(s_pf0vf0),packet_type(ns=0/0,id=0/0),eth(src=00:00:00:01:01:11,dst=00:00:00:00:00:01),eth_type(0x0800),ipv4(src=192.168.1.0/255.255.255.128,dst=172.17.174.11,proto=1,tos=0/0,ttl=64,frag=no),icmp(type=0/0,code=0/0), packets:2268, bytes:222264, used:0.340s, offloaded:yes, dp:tc, actions:ct_clear,ct_clear,set(eth(src=00:00:00:00:00:02,dst=00:00:00:01:02:11)),set(ipv4(ttl=63)),s_pf0vf2

<==== ipv4 is offloaded

ufid:14952cf7-4625-4f21-9b13-47a74c217761, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(s_pf0vf0),packet_type(ns=0/0,id=0/0),eth(src=00:00:00:01:01:11,dst=00:00:00:00:00:01),eth_type(0x86dd),ipv6(src=2001::/ffff:ffff:ffff:ffff::,dst=7777:174::11,label=0/0,proto=58,tclass=0/0,hlimit=64,frag=no),icmpv6(type=0/0,code=0/0), packets:17, bytes:1768, used:0.981s, dp:tc, actions:ct_clear,ct_clear,set(eth(src=00:00:00:00:00:02,dst=00:00:00:01:02:11)),set(ipv6(hlimit=63)),s_pf0vf2

<==== ipv6 is not offloaded

ufid:a6055232-a637-4f7a-9615-c772d29e7675, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(s_pf0vf2),packet_type(ns=0/0,id=0/0),eth(src=00:00:00:01:02:11,dst=00:00:00:00:00:02),eth_type(0x0800),ipv4(src=172.17.174.0/255.255.255.128,dst=192.168.1.11,proto=1,tos=0/0,ttl=64,frag=no),icmp(type=0/0,code=0/0), packets:2268, bytes:222264, used:0.340s, offloaded:yes, dp:tc, actions:ct_clear,ct_clear,set(eth(src=00:00:00:00:00:01,dst=00:00:00:01:01:11)),set(ipv4(ttl=63)),s_pf0vf0
ufid:3e8cc87a-7746-411a-bff0-815ea790fa57, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(s_pf0vf2),packet_type(ns=0/0,id=0/0),eth(src=00:00:00:01:02:11,dst=00:00:00:00:00:02),eth_type(0x86dd),ipv6(src=7777:174::/ffff:ffff:ffff:ffff::,dst=2001::11,label=0/0,proto=58,tclass=0/0,hlimit=64,frag=no),icmpv6(type=0/0,code=0/0), packets:17, bytes:1768, used:0.980s, dp:tc, actions:ct_clear,ct_clear,set(eth(src=00:00:00:00:00:01,dst=00:00:00:01:01:11)),set(ipv6(hlimit=63)),s_pf0vf0


[root@wsfd-advnetlab18 ~]# uname -a
Linux wsfd-advnetlab18.anl.lab.eng.bos.redhat.com 4.18.0-305.el8.x86_64 #1 SMP Thu Apr 29 08:54:30 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux
[root@wsfd-advnetlab18 ~]# rpm -qa | grep -E "openvswitch2.15|ovn-2.21"
python3-openvswitch2.15-2.15.0-23.el8fdp.x86_64
openvswitch2.15-2.15.0-23.el8fdp.x86_64
ovn-2021-21.03.0-40.el8fdp.x86_64
ovn-2021-host-21.03.0-40.el8fdp.x86_64
ovn-2021-central-21.03.0-40.el8fdp.x86_64

[root@wsfd-advnetlab18 ~]# cat /proc/cmdline 
BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-305.el8.x86_64 root=/dev/mapper/rhel_wsfd--advnetlab18-root ro crashkernel=auto resume=/dev/mapper/rhel_wsfd--advnetlab18-swap rd.lvm.lv=rhel_wsfd-advnetlab18/root rd.lvm.lv=rhel_wsfd-advnetlab18/swap console=ttyS1,115200 intel_iommu=on iommu=pt

[root@wsfd-advnetlab18 ~]# cat /proc/cmdline 
BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-305.el8.x86_64 root=/dev/mapper/rhel_wsfd--advnetlab18-root ro crashkernel=auto resume=/dev/mapper/rhel_wsfd--advnetlab18-swap rd.lvm.lv=rhel_wsfd-advnetlab18/root rd.lvm.lv=rhel_wsfd-advnetlab18/swap console=ttyS1,115200 intel_iommu=on iommu=pt
[root@wsfd-advnetlab18 ~]# ethtool  -i ens1f0
driver: mlx5e_rep
version: 4.18.0-305.el8.x86_64
firmware-version: 16.27.2008 (MT_0000000013)
expansion-rom-version: 
bus-info: 0000:3b:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no
[root@wsfd-advnetlab18 ~]# ip link sh ens1f0
249: ens1f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 0c:42:a1:08:0b:02 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether 00:00:00:01:01:11 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
    vf 1     link/ether 00:00:00:01:01:12 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
    vf 2     link/ether 00:00:00:01:02:11 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
    vf 3     link/ether 00:00:00:01:02:12 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
[root@wsfd-advnetlab18 ~]# ethtool  -i s_pf0vf0
driver: mlx5e_rep
version: 4.18.0-305.el8.x86_64
firmware-version: 16.27.2008 (MT_0000000013)
expansion-rom-version: 
bus-info: 
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no
[root@wsfd-advnetlab18 ~]# ethtool  -i s_pf0vf2
driver: mlx5e_rep
version: 4.18.0-305.el8.x86_64
firmware-version: 16.27.2008 (MT_0000000013)
expansion-rom-version: 
bus-info: 
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

[root@wsfd-advnetlab18 ~]# lspci | grep mell -i
3b:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
3b:00.1 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
3b:00.2 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]
3b:00.3 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]
3b:00.4 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]
3b:00.5 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]
3b:01.2 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]
3b:01.3 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function]

Comment 1 Jianlin Shi 2021-06-03 02:08:24 UTC
Created attachment 1788771 [details]
flows for br-int

Comment 2 Jianlin Shi 2021-06-03 02:08:44 UTC
[root@wsfd-advnetlab18 ~]# ovs-vsctl show
928f0153-86ca-4031-9f1d-73287246b0aa
    Bridge br-int
        fail_mode: secure
        Port s_pf1vf0
            Interface s_pf1vf0
        Port s_pf0vf3
            Interface s_pf0vf3
        Port br-int
            Interface br-int
                type: internal
        Port s_pf0vf0
            Interface s_pf0vf0
        Port ovn-hv0-0
            Interface ovn-hv0-0
                type: geneve
                options: {csum="true", key=flow, remote_ip="20.0.174.26"}
        Port s_pf0vf2
            Interface s_pf0vf2
        Port s_pf1vf1
            Interface s_pf1vf1
        Port s_pf0vf1
            Interface s_pf0vf1
    ovs_version: "2.15.1"

Comment 3 Jianlin Shi 2021-06-11 03:18:03 UTC
tcp and udp are offloaded:

[root@wsfd-advnetlab16 ~]# ovs-appctl dpctl/dump-flows -m --names                                     
ufid:176d34f0-cc49-490b-9234-e6997ba86aa5, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(s_pf0vf0),packet_type(ns=0/0,id=0/0),eth(src=00:00:00:01:01:11,dst=00:00:00:00:00:01),eth_type(0x0800),ipv4(src=192.168.1.0/255.255.255.128,dst=64.0.0.0/224.0.0.0,proto=0/0,tos=0/0,ttl=64,frag=no), packets:0, bytes:0, used:never, dp:tc, actions:ct_clear
ufid:cfedf95e-7647-4700-b80c-980d3df4f0d5, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(s_pf0vf0),packet_type(ns=0/0,id=0/0),eth(src=00:00:00:01:01:11,dst=00:00:00:00:00:01),eth_type(0x86dd),ipv6(src=2001::/ffff:ffff:ffff:ffff::,dst=7777:172::11,label=0/0,proto=6,tclass=0/0,hlimit=64,frag=no),tcp(src=0/0,dst=0/0), packets:2097272,
bytes:3170661917, used:0.521s, offloaded:yes, dp:tc, actions:ct_clear,ct_clear,set(eth(src=00:00:00:00:00:02,dst=00:00:00:01:02:11)),set(ipv6(hlimit=63)),s_pf0vf2
ufid:d357c98f-62a1-4bc9-8e40-c826288aba02, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(s_pf0vf2),packet_type(ns=0/0,id=0/0),eth(src=00:00:00:01:02:11,dst=00:00:00:00:00:02),eth_type(0x86dd),ipv6(src=7777:172::/ffff:ffff:ffff:ffff::,dst=2001::11,label=0/0,proto=6,tclass=0/0,hlimit=64,frag=no),tcp(src=0/0,dst=0/0), packets:22151, bytes:1904998, used:0.520s, offloaded:yes, dp:tc, actions:ct_clear,ct_clear,set(eth(src=00:00:00:00:00:01,dst=00:00:00:01:01:11)),set(ipv6(hlimit=63)),s_pf0vf0
ufid:c7aef2c3-9dba-44fb-ab61-d14b2fd8d1f2, recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(s_pf0vf0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),eth(src=00:00:00:01:01:11,dst=00:00:00:00:00:01),eth_type(0x0806),arp(sip=192.168.1.11,tip=192.168.1.254,op=1/0xff,sha=00:00:00:01:01:11,tha=00:00:00:00:00:00), packets:0, bytes:0, used:never, dp:ovs, actions:userspace(pid=3636835820,slow_path(action))
[root@wsfd-advnetlab16 ~]# ovs-appctl dpctl/dump-flows -m --names                                     
ufid:d4f2d986-a742-4104-8859-6461588be0e5, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(s_pf0vf0),packet_type(ns=0/0,id=0/0),eth(src=00:00:00:01:01:11,dst=00:00:00:00:00:01),eth_type(0x86dd),ipv6(src=2001::/ffff:ffff:ffff:ffff::,dst=7777:172::11,label=0/0,proto=6,tclass=0/0,hlimit=64,frag=no),tcp(src=0/0,dst=0/0), packets:10, bytes:1023, used:0.180s, offloaded:yes, dp:tc, actions:ct_clear,ct_clear,set(eth(src=00:00:00:00:00:02,dst=00:00:00:01:02:11)),set(ipv6(hlimit=63)),s_pf0vf2
ufid:e76a055c-ecfb-4d39-8be8-2ce235982e6c, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(s_pf0vf0),packet_type(ns=0/0,id=0/0),eth(src=00:00:00:01:01:11,dst=00:00:00:00:00:01),eth_type(0x86dd),ipv6(src=2001::/ffff:ffff:ffff:ffff::,dst=7777:172::11,label=0/0,proto=17,tclass=0/0,hlimit=64,frag=no),udp(src=0/0,dst=0/0), packets:92, bytes:137080, used:0.180s, offloaded:yes, dp:tc, actions:ct_clear,ct_clear,set(eth(src=00:00:00:00:00:02,dst=00:00:00:01:02:11)),set(ipv6(hlimit=63)),s_pf0vf2
ufid:e1ca52da-64d6-4d00-a2b6-0bf533004e8b, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(s_pf0vf2),packet_type(ns=0/0,id=0/0),eth(src=00:00:00:01:02:11,dst=00:00:00:00:00:02),eth_type(0x86dd),ipv6(src=7777:172::/ffff:ffff:ffff:ffff::,dst=2001::11,label=0/0,proto=6,tclass=0/0,hlimit=64,frag=no),tcp(src=0/0,dst=0/0), packets:7, bytes:607, used:0.181s, offloaded:yes, dp:tc, actions:ct_clear,ct_clear,set(eth(src=00:00:00:00:00:01,dst=00:00:00:01:01:11)),set(ipv6(hlimit=63)),s_pf0vf0
ufid:21199520-c3b6-4715-a4fb-959170e300db, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(s_pf0vf2),packet_type(ns=0/0,id=0/0),eth(src=00:00:00:01:02:11,dst=00:00:00:00:00:02),eth_type(0x86dd),ipv6(src=7777:172::/ffff:ffff:ffff:ffff::,dst=2001::11,label=0/0,proto=17,tclass=0/0,hlimit=64,frag=no),udp(src=0/0,dst=0/0), packets:0, bytes:0, used:1.220s, offloaded:yes, dp:tc, actions:ct_clear,ct_clear,set(eth(src=00:00:00:00:00:01,dst=00:00:00:01:01:11)),set(ipv6(hlimit=63)),s_pf0vf0

only icmp6 is not offloaded

Comment 4 Marcelo Ricardo Leitner 2021-06-15 13:16:54 UTC
Hi Alaa,

Any known limitation on offloading icmp6 as above in comment #0? It is using dp:tc, but not getting offloaded.

Jianlin, maybe you can grab an extack message out of the failure to offload with the new perf probe that was added via https://bugzilla.redhat.com/show_bug.cgi?id=1956983

Thanks,
Marcelo

Comment 5 Jianlin Shi 2021-06-16 06:40:06 UTC
(In reply to Marcelo Ricardo Leitner from comment #4)
> Hi Alaa,
> 
> Any known limitation on offloading icmp6 as above in comment #0? It is using
> dp:tc, but not getting offloaded.
> 
> Jianlin, maybe you can grab an extack message out of the failure to offload
> with the new perf probe that was added via
> https://bugzilla.redhat.com/show_bug.cgi?id=1956983
> 
> Thanks,
> Marcelo

[root@wsfd-advnetlab16 ~]# uname -a
Linux wsfd-advnetlab16.anl.lab.eng.bos.redhat.com 4.18.0-312.el8.x86_64 #1 SMP Wed Jun 2 16:30:46 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux
[root@wsfd-advnetlab16 ~]# perf script -v
build id event received for [kernel.kallsyms]: d62e5584320038183a19ae49ebdee183cbdf29b5 [20]
build id event received for [vdso]: 8eebcc6a6fa31251db815633bb5eb9259bbbcb55 [20]
Looking at the vmlinux_path (8 entries long)
symsrc__init: cannot get elf header.
Using /proc/kcore for kernel data
Using /proc/kallsyms for symbols
        handler2  8892 [010] 11156.013083: netlink:netlink_extack: msg=mlx5_core: can't offload TC csum action for some header/s
        handler2  8892 [010] 11156.029937: netlink:netlink_extack: msg=mlx5_core: can't offload TC csum action for some header/s

Comment 6 Alaa Hleihel (NVIDIA Mellanox) 2021-06-16 10:01:22 UTC
Thanks, Jianlin. 

There is another message in dmesg log with the problematic flag.
From your system:

/var/log/messages:Jun 16 04:56:28 wsfd-advnetlab16 kernel: s_pf0vf0: can't offload TC csum action for some header/s - flags 0x2
                                                                                                                      ^^^^^^^^^^

Flag 0x2 is TCA_CSUM_UPDATE_FLAG_ICMP
And we can see in function csum_offload_supported() that csum on ICMP is not supported:
https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c#L2836

rewrite ICMP header + TC csum action is not supported, so closing the BZ.

Regards
Alaa

Comment 7 Jianlin Shi 2021-06-17 03:45:45 UTC
(In reply to Alaa Hleihel (NVIDIA Mellanox) from comment #6)
> Thanks, Jianlin. 
> 
> There is another message in dmesg log with the problematic flag.
> From your system:
> 
> /var/log/messages:Jun 16 04:56:28 wsfd-advnetlab16 kernel: s_pf0vf0: can't
> offload TC csum action for some header/s - flags 0x2
>                                                                             
> ^^^^^^^^^^
> 
> Flag 0x2 is TCA_CSUM_UPDATE_FLAG_ICMP
> And we can see in function csum_offload_supported() that csum on ICMP is not
> supported:
> https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/mellanox/
> mlx5/core/en_tc.c#L2836
> 
> rewrite ICMP header + TC csum action is not supported, so closing the BZ.

@dumitru , is the "rewrite ICMP header + TC csum action" required by ovs flow added by ovn? the topo is described in description: ls-lr-ls

> 
> Regards
> Alaa

Comment 8 Dumitru Ceara 2021-06-17 09:05:32 UTC
(In reply to Jianlin Shi from comment #7)
> (In reply to Alaa Hleihel (NVIDIA Mellanox) from comment #6)
> > Thanks, Jianlin. 
> > 
> > There is another message in dmesg log with the problematic flag.
> > From your system:
> > 
> > /var/log/messages:Jun 16 04:56:28 wsfd-advnetlab16 kernel: s_pf0vf0: can't
> > offload TC csum action for some header/s - flags 0x2
> >                                                                             
> > ^^^^^^^^^^
> > 
> > Flag 0x2 is TCA_CSUM_UPDATE_FLAG_ICMP
> > And we can see in function csum_offload_supported() that csum on ICMP is not
> > supported:
> > https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/mellanox/
> > mlx5/core/en_tc.c#L2836
> > 
> > rewrite ICMP header + TC csum action is not supported, so closing the BZ.
> 
> @dumitru , is the "rewrite ICMP header + TC csum action" required by ovs
> flow added by ovn? the topo is described in description: ls-lr-ls

OVN adds flows that manipulate headers, including ICMP/ICMPv6.  OVN does *not*
control hw offload and any TC rules, that's external, and handled by OVS.

However, looking at the flow that wasn't offloaded I don't see any ICMPv6 header
changes:

ufid:3e8cc87a-7746-411a-bff0-815ea790fa57, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(s_pf0vf2),packet_type(ns=0/0,id=0/0),eth(src=00:00:00:01:02:11,dst=00:00:00:00:00:02),eth_type(0x86dd),ipv6(src=7777:174::/ffff:ffff:ffff:ffff::,dst=2001::11,label=0/0,proto=58,tclass=0/0,hlimit=64,frag=no),icmpv6(type=0/0,code=0/0), packets:17, bytes:1768, used:0.980s, dp:tc, actions:ct_clear,ct_clear,set(eth(src=00:00:00:00:00:01,dst=00:00:00:01:01:11)),set(ipv6(hlimit=63)),s_pf0vf0

The only ipv6 change there is decrementing hlimit due to routing.

> 
> > 
> > Regards
> > Alaa

Regards,
Dumitru

Comment 9 Marcelo Ricardo Leitner 2021-06-17 11:50:50 UTC
(In reply to Dumitru Ceara from comment #8)
> However, looking at the flow that wasn't offloaded I don't see any ICMPv6
> header changes:

Good point.

> 
> ufid:3e8cc87a-7746-411a-bff0-815ea790fa57,
> skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),
> ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(s_pf0vf2),packet_type(ns=0/0,
> id=0/0),eth(src=00:00:00:01:02:11,dst=00:00:00:00:00:02),eth_type(0x86dd),
> ipv6(src=7777:174::/ffff:ffff:ffff:ffff::,dst=2001::11,label=0/0,proto=58,
> tclass=0/0,hlimit=64,frag=no),icmpv6(type=0/0,code=0/0), packets:17,
> bytes:1768, used:0.980s, dp:tc,
> actions:ct_clear,ct_clear,set(eth(src=00:00:00:00:00:01,dst=00:00:00:01:01:
> 11)),set(ipv6(hlimit=63)),s_pf0vf0
> 
> The only ipv6 change there is decrementing hlimit due to routing.

That 0x2 from dmesg on comment #6 means TCA_CSUM_UPDATE_FLAG_ICMP and apparently nobody else other than OVS specifies it.

  static inline int
  csum_update_flag(struct tc_flower *flower,
                   enum pedit_header_type htype) {
      /* Explictily specifiy the csum flags so HW can return EOPNOTSUPP
       * if it doesn't support a checksum recalculation of some headers.
       * And since OVS allows a flow such as
       * eth(dst=<mac>),eth_type(0x0800) actions=set(ipv4(src=<new_ip>))
       * we need to force a more specific flow as this can, for example,
       * need a recalculation of icmp checksum if the packet that passes
       * is ICMPv6 and tcp checksum if its tcp. */

      switch (htype) {
      case TCA_PEDIT_KEY_EX_HDR_TYPE_IP4:
          flower->csum_update_flags |= TCA_CSUM_UPDATE_FLAG_IPV4HDR;
          /* Fall through. */
      case TCA_PEDIT_KEY_EX_HDR_TYPE_IP6:
      case TCA_PEDIT_KEY_EX_HDR_TYPE_TCP:
      case TCA_PEDIT_KEY_EX_HDR_TYPE_UDP:
          if (flower->key.ip_proto == IPPROTO_TCP) {
              flower->needs_full_ip_proto_mask = true;
              flower->csum_update_flags |= TCA_CSUM_UPDATE_FLAG_TCP;
          } else if (flower->key.ip_proto == IPPROTO_UDP) {
              flower->needs_full_ip_proto_mask = true;
              flower->csum_update_flags |= TCA_CSUM_UPDATE_FLAG_UDP;
          } else if (flower->key.ip_proto == IPPROTO_ICMP) {
              flower->needs_full_ip_proto_mask = true;
          } else if (flower->key.ip_proto == IPPROTO_ICMPV6) {
              flower->needs_full_ip_proto_mask = true;
              flower->csum_update_flags |= TCA_CSUM_UPDATE_FLAG_ICMP;

But AFAICT from nl_msg_put_flower_rewrite_pedits, that should be handling the per header pedit requests.
Lets reopen this one for now. There's still smoke coming from this bush.

Comment 10 Marcelo Ricardo Leitner 2021-06-17 18:29:10 UTC
It could be that tc.c:calc_offsets() is calculating something wrongly, confusing csum_update_flag() above. But this is just a theory ATM.

Comment 11 Jianlin Shi 2021-06-18 00:28:31 UTC
(In reply to Alaa Hleihel (NVIDIA Mellanox) from comment #6)
> Thanks, Jianlin. 
> 
> There is another message in dmesg log with the problematic flag.
> From your system:
> 
> /var/log/messages:Jun 16 04:56:28 wsfd-advnetlab16 kernel: s_pf0vf0: can't
> offload TC csum action for some header/s - flags 0x2
>                                                                             
> ^^^^^^^^^^
> 
> Flag 0x2 is TCA_CSUM_UPDATE_FLAG_ICMP
> And we can see in function csum_offload_supported() that csum on ICMP is not
> supported:
> https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/mellanox/
> mlx5/core/en_tc.c#L2836
> 
> rewrite ICMP header + TC csum action is not supported, so closing the BZ.

will this kind of operation be supported in the future?
btw, do you have any documentation about what the hw offload support?

> 
> Regards
> Alaa

Comment 12 Alaa Hleihel (NVIDIA Mellanox) 2021-06-21 09:00:30 UTC
(In reply to Jianlin Shi from comment #11)
> (In reply to Alaa Hleihel (NVIDIA Mellanox) from comment #6)
> > Thanks, Jianlin. 
> > 
> > There is another message in dmesg log with the problematic flag.
> > From your system:
> > 
> > /var/log/messages:Jun 16 04:56:28 wsfd-advnetlab16 kernel: s_pf0vf0: can't
> > offload TC csum action for some header/s - flags 0x2
> >                                                                             
> > ^^^^^^^^^^
> > 
> > Flag 0x2 is TCA_CSUM_UPDATE_FLAG_ICMP
> > And we can see in function csum_offload_supported() that csum on ICMP is not
> > supported:
> > https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/mellanox/
> > mlx5/core/en_tc.c#L2836
> > 
> > rewrite ICMP header + TC csum action is not supported, so closing the BZ.
> 
> will this kind of operation be supported in the future?

No, it's a HW limitation, so it won't be supported.

When doing header rewrite, ovs can ask re-calculates the relevant L3/L4 checksums, and we do not support it for icmpv6

> btw, do you have any documentation about what the hw offload support?

Unfortunately, there is no such document...

Comment 13 Marcelo Ricardo Leitner 2021-06-22 12:47:06 UTC
What I don't follow is why icmp6 checksum calc is getting activated.
It could be because it's matching against icmp type, but then, if it's a match, it shouldn't need to recompute the checksum.
"...,icmpv6(type=0/0,code=0/0)..."
maybe ovs is just being extra safe?

Comment 14 Marcelo Ricardo Leitner 2021-07-09 20:24:30 UTC
Please note that for HWOL I'm considering this bug as a low priority one.
ICMPs are a really low volume traffic and doesn't impact much the solution.
If you disagree, please comment. Thanks.

Comment 17 Aaron Conole 2023-05-23 21:20:36 UTC
(In reply to Marcelo Ricardo Leitner from comment #13)
> What I don't follow is why icmp6 checksum calc is getting activated.
> It could be because it's matching against icmp type, but then, if it's a
> match, it shouldn't need to recompute the checksum.
> "...,icmpv6(type=0/0,code=0/0)..."
> maybe ovs is just being extra safe?

Hrrm... maybe we should consider a change like the following:

https://github.com/orgcandman/ovs/tree/rfc_csum_ip6

WDYT?

Comment 18 Marcelo Ricardo Leitner 2023-05-23 22:02:07 UTC
It's towards the right direction, I think, but I don't get why some protocols got 'true' for the new flag, such as TCA_PEDIT_KEY_EX_HDR_TYPE_ETH and TCA_PEDIT_KEY_EX_HDR_TYPE_TCP.


Note You need to log in before you can comment on or make changes to this bug.