Bug 2030462

Summary: Performance drop while Mirroring OVS Ports in an OVS HW Offload Scenario
Product: Red Hat OpenStack Reporter: coldford <coldford>
Component: rhosp-openvswitchAssignee: OSP Team <rhos-maint>
Status: VERIFIED --- QA Contact: OSP Team <rhos-maint>
Severity: high Docs Contact:
Priority: high    
Version: 16.2 (Train)CC: atzin, broose, cfields, echaudro, fboboc, guilherme.giacon, hakhande, jfargen, jimm, john.choo, lariel, mburns, mleitner, nnarang, ptalbert, ralonsoh, rkhan, spower
Target Milestone: z3Keywords: Bugfix, OtherQA, TestOnly, Triaged
Target Release: 16.2 (Train on RHEL 8.4)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1915308, 2037489, 2042651    
Bug Blocks: 2131355    
Attachments:
Description Flags
net/mlx5: Set reformat action when needed for termination rules
none
net/mlx5: DR, Use FW API when updating FW-owned flow table none

Comment 5 Marcelo Ricardo Leitner 2021-12-08 22:56:53 UTC
FWIW, this extack msg will be logged on vswitchd log once we finish https://bugzilla.redhat.com/show_bug.cgi?id=1916418  (and of course, log level is adjusted properly)

Comment 7 Marcelo Ricardo Leitner 2021-12-09 15:26:56 UTC
As I don't have a timestamp to correlate, lets check all errors logged:

$ perf script  | sed -n 's/.*netlink://p' | sort -u
No kallsyms or vmlinux with build-id 5019ef4e3f199fc0f67e3b59495e1cdb92ca2e07 was found
netlink_extack: msg=mlx5_core: devices are not on same switch HW, can't offload forwarding
netlink_extack: msg=mlx5_core: The offload action is not supported
netlink_extack: msg=mlx5_core: Unsupported key

Flow in question:
ufid:a9b6fbe4-a76a-421f-bd14-905f7fec31d5, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(ens3f0_48),packet_type(ns=0/0,id=0/0),eth(src=80:19:00:00:00:01,dst=00:00:00:00:44:44),eth_type(0x86dd),ipv6(src=::/::,dst=::/::,label=0/0,proto=0/0,tclass=0/0x3,hlimit=0/0,frag=no), packets:981670521, bytes:4775302999578, used:0.000s, dp:tc, actions:set(tunnel(tun_id=0x0,src=172.20.16.13,dst=10.75.125.168,ttl=64,flags(key))),gre_sys,push_vlan(vid=610,pcp=0),bond1

gre_sys action is supported, as there's this flow using it and got offloaded:
ufid:8b3584e2-d624-4d2e-a629-bcbd79eeeffb, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(bond1),packet_type(ns=0/0,id=0/0),eth(src=00:00:00:00:44:44,dst=80:19:00:00:00:01),eth_type(0x8100),vlan(vid=610,pcp=0),encap(eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=0/0,tos=0/0x3,ttl=0/0,frag=no)), packets:984521022, bytes:4851832336920, used:0.160s, offloaded:yes, dp:tc, actions:pop_vlan,ens3f0_48,set(tunnel(tun_id=0x0,src=172.20.16.13,dst=10.75.125.168,ttl=64,flags(key))),gre_sys

So I'm thinking it's about the "Unsupported key" one. I don't know why, but the first is matching on:
ipv6(src=::/::,dst=::/::,label=0/0,proto=0/0,tclass=0/0x3,hlimit=0/0,frag=no)
                                             ^^^^^^^^^^^^

tclass goes with ipv6 flow label, and:
        FLOW_DISSECTOR_KEY_FLOW_LABEL, /* struct flow_dissector_key_tags */

but:
mlx5/en_tc.c: __parse_cls_flower()

        if (dissector->used_keys &
            ~(BIT(FLOW_DISSECTOR_KEY_META) |
              BIT(FLOW_DISSECTOR_KEY_CONTROL) |
              BIT(FLOW_DISSECTOR_KEY_BASIC) |
              BIT(FLOW_DISSECTOR_KEY_ETH_ADDRS) |
              BIT(FLOW_DISSECTOR_KEY_VLAN) |
              BIT(FLOW_DISSECTOR_KEY_CVLAN) |
              BIT(FLOW_DISSECTOR_KEY_IPV4_ADDRS) |
              BIT(FLOW_DISSECTOR_KEY_IPV6_ADDRS) |
              BIT(FLOW_DISSECTOR_KEY_PORTS) |
              BIT(FLOW_DISSECTOR_KEY_ENC_KEYID) |
              BIT(FLOW_DISSECTOR_KEY_ENC_IPV4_ADDRS) |
              BIT(FLOW_DISSECTOR_KEY_ENC_IPV6_ADDRS) |
              BIT(FLOW_DISSECTOR_KEY_ENC_PORTS) |
              BIT(FLOW_DISSECTOR_KEY_ENC_CONTROL) |
              BIT(FLOW_DISSECTOR_KEY_TCP) |
              BIT(FLOW_DISSECTOR_KEY_IP)  |
              BIT(FLOW_DISSECTOR_KEY_CT) |
              BIT(FLOW_DISSECTOR_KEY_ENC_IP) |
              BIT(FLOW_DISSECTOR_KEY_ENC_OPTS) |
              BIT(FLOW_DISSECTOR_KEY_MPLS))) {
                NL_SET_ERR_MSG_MOD(extack, "Unsupported key");              <-- our error msg
                netdev_dbg(priv->netdev, "Unsupported key used: 0x%x\n",    <-- [A]
                           dissector->used_keys);
                return -EOPNOTSUPP;
        }

FLOW_DISSECTOR_KEY_FLOW_LABEL is not listed as supported.
I see other flows using tclass=0/0x3, but they were handling broadcasts and outputting to multiple ports. They likely hit the error of not offloading because the interfaces are not on the same HW (different error).

Please enable that debug message and let's confirm that it is getting printed.
To enable it:
  # echo 'func __parse_cls_flower +p' > /sys/kernel/debug/dynamic_debug/control

You should see the log in dmesg or journal, depending on your configs.
If confirmed, we need to understand why ovs is adding this match.

Comment 8 Marcelo Ricardo Leitner 2021-12-09 15:29:14 UTC
Expected code to be logged on the msg is:
crash> px (int)FLOW_DISSECTOR_KEY_FLOW_LABEL
$2 = 0xb            <---

Comment 9 Marcelo Ricardo Leitner 2021-12-09 17:11:27 UTC
Realized during the mtg today that tclass=0/0x3 may be translated into ip_tos 0/0x3, which should be supported as I saw other flows using it and got offloaded. Hmmm. Still worth the test above, btw.

Comment 12 Marcelo Ricardo Leitner 2021-12-10 12:55:06 UTC
The number in unsupported key msg can be translated with this table:
http://pastebin.test.redhat.com/1014795 (stored forever)

Comment 13 Marcelo Ricardo Leitner 2021-12-10 13:15:12 UTC
Please note that the extack would be logged only once and only when the flow is created in the datapath.
I'm afraid we're not capturing the error due to test procedure. We need to wait for the hot flows to idle out and then try again. Only then the error will be logged.

Comment 17 Ariel Levkovich 2021-12-10 15:23:42 UTC
(In reply to Marcelo Ricardo Leitner from comment #7)
> As I don't have a timestamp to correlate, lets check all errors logged:
> 
> $ perf script  | sed -n 's/.*netlink://p' | sort -u
> No kallsyms or vmlinux with build-id
> 5019ef4e3f199fc0f67e3b59495e1cdb92ca2e07 was found
> netlink_extack: msg=mlx5_core: devices are not on same switch HW, can't
> offload forwarding
> netlink_extack: msg=mlx5_core: The offload action is not supported
> netlink_extack: msg=mlx5_core: Unsupported key
> 
> Flow in question:
> ufid:a9b6fbe4-a76a-421f-bd14-905f7fec31d5,
> skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),
> ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(ens3f0_48),packet_type(ns=0/
> 0,id=0/0),eth(src=80:19:00:00:00:01,dst=00:00:00:00:44:44),eth_type(0x86dd),
> ipv6(src=::/::,dst=::/::,label=0/0,proto=0/0,tclass=0/0x3,hlimit=0/0,
> frag=no), packets:981670521, bytes:4775302999578, used:0.000s, dp:tc,
> actions:set(tunnel(tun_id=0x0,src=172.20.16.13,dst=10.75.125.168,ttl=64,
> flags(key))),gre_sys,push_vlan(vid=610,pcp=0),bond1
> 
> gre_sys action is supported, as there's this flow using it and got offloaded:
> ufid:8b3584e2-d624-4d2e-a629-bcbd79eeeffb,
> skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),
> ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(bond1),packet_type(ns=0/0,
> id=0/0),eth(src=00:00:00:00:44:44,dst=80:19:00:00:00:01),eth_type(0x8100),
> vlan(vid=610,pcp=0),encap(eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.
> 0.0/0.0.0.0,proto=0/0,tos=0/0x3,ttl=0/0,frag=no)), packets:984521022,
> bytes:4851832336920, used:0.160s, offloaded:yes, dp:tc,
> actions:pop_vlan,ens3f0_48,set(tunnel(tun_id=0x0,src=172.20.16.13,dst=10.75.
> 125.168,ttl=64,flags(key))),gre_sys
> 
> So I'm thinking it's about the "Unsupported key" one. I don't know why, but
> the first is matching on:
> ipv6(src=::/::,dst=::/::,label=0/0,proto=0/0,tclass=0/0x3,hlimit=0/0,frag=no)
>                                              ^^^^^^^^^^^^
> 
> tclass goes with ipv6 flow label, and:
>         FLOW_DISSECTOR_KEY_FLOW_LABEL, /* struct flow_dissector_key_tags */
> 
> but:
> mlx5/en_tc.c: __parse_cls_flower()
> 
>         if (dissector->used_keys &
>             ~(BIT(FLOW_DISSECTOR_KEY_META) |
>               BIT(FLOW_DISSECTOR_KEY_CONTROL) |
>               BIT(FLOW_DISSECTOR_KEY_BASIC) |
>               BIT(FLOW_DISSECTOR_KEY_ETH_ADDRS) |
>               BIT(FLOW_DISSECTOR_KEY_VLAN) |
>               BIT(FLOW_DISSECTOR_KEY_CVLAN) |
>               BIT(FLOW_DISSECTOR_KEY_IPV4_ADDRS) |
>               BIT(FLOW_DISSECTOR_KEY_IPV6_ADDRS) |
>               BIT(FLOW_DISSECTOR_KEY_PORTS) |
>               BIT(FLOW_DISSECTOR_KEY_ENC_KEYID) |
>               BIT(FLOW_DISSECTOR_KEY_ENC_IPV4_ADDRS) |
>               BIT(FLOW_DISSECTOR_KEY_ENC_IPV6_ADDRS) |
>               BIT(FLOW_DISSECTOR_KEY_ENC_PORTS) |
>               BIT(FLOW_DISSECTOR_KEY_ENC_CONTROL) |
>               BIT(FLOW_DISSECTOR_KEY_TCP) |
>               BIT(FLOW_DISSECTOR_KEY_IP)  |
>               BIT(FLOW_DISSECTOR_KEY_CT) |
>               BIT(FLOW_DISSECTOR_KEY_ENC_IP) |
>               BIT(FLOW_DISSECTOR_KEY_ENC_OPTS) |
>               BIT(FLOW_DISSECTOR_KEY_MPLS))) {
>                 NL_SET_ERR_MSG_MOD(extack, "Unsupported key");             
> <-- our error msg
>                 netdev_dbg(priv->netdev, "Unsupported key used: 0x%x\n",   
> <-- [A]
>                            dissector->used_keys);
>                 return -EOPNOTSUPP;
>         }
> 
> FLOW_DISSECTOR_KEY_FLOW_LABEL is not listed as supported.
> I see other flows using tclass=0/0x3, but they were handling broadcasts and
> outputting to multiple ports. They likely hit the error of not offloading
> because the interfaces are not on the same HW (different error).
> 
> Please enable that debug message and let's confirm that it is getting
> printed.
> To enable it:
>   # echo 'func __parse_cls_flower +p' >
> /sys/kernel/debug/dynamic_debug/control
> 
> You should see the log in dmesg or journal, depending on your configs.
> If confirmed, we need to understand why ovs is adding this match.

Hi Marcelo, mlx5 checks the keys first, before parsing and checking the actions so if there were other flows with this FLOW_LABEL key I would expect them to fail on the unsupported key check (before they fail on the forwarding check)

Comment 20 Ariel Levkovich 2021-12-13 21:51:24 UTC
(In reply to Jamie Fargen from comment #19)
> We are now seeing these messages in /var/log/openvswitch/ovs-vswitchd.log.
> 
> 2021-12-13T21:09:18.214Z|00022|netdev_offload_tc(handler2)|DBG|offloading
> attribute icmpv6_type isn't supported
> 2021-12-13T21:10:01.247Z|00006|netdev_offload_tc(handler10)|DBG|offloading
> attribute icmpv6_type isn't supported
> 2021-12-13T21:10:02.068Z|00007|netdev_offload_tc(handler10)|DBG|offloading
> attribute icmpv6_type isn't supported
> 2021-12-13T21:10:03.213Z|00003|netdev_offload_tc(handler26)|DBG|offloading
> attribute icmpv6_type isn't supported

Indeed currently OVS code blocks offloading rules that match ICMPv6 type:

else if (key->dl_type == htons(ETH_TYPE_IPV6) &&
                 key->nw_proto == IPPROTO_ICMPV6) {
          if (mask->tp_src) {
              VLOG_DBG_RL(&rl,
                          "offloading attribute icmpv6_type isn't supported");
              return EOPNOTSUPP;
          }

Comment 21 Marcelo Ricardo Leitner 2021-12-13 21:53:56 UTC
And that should be fine (to not be offloaded). ICMPs are supposed to be low traffic.

Comment 25 Amir Tzin (Mellanox) 2021-12-16 19:43:52 UTC
Created attachment 1846641 [details]
net/mlx5: Set reformat action when needed for termination rules

Comment 26 Amir Tzin (Mellanox) 2021-12-16 19:45:54 UTC
Created attachment 1846642 [details]
net/mlx5: DR, Use FW API when updating FW-owned flow table

Comment 27 Amir Tzin (Mellanox) 2021-12-16 20:04:20 UTC
42b3d7b671b  ("net/mlx5: Set reformat action when needed for termination rules") from v5.13-rc4
a01a43fa16e1 ("net/mlx5: DR, Use FW API when updating FW-owned flow table") from v5.15-rc1
attached and cleanly applied on 8.4 branch are probably the needed fix for the issue.

brew build link:
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=41973349

Comment 28 Anita Tragler 2022-01-05 13:32:40 UTC
Need clone bugzilla for RHEL to bring in driver patches
Customer is requesting hotfix and this with patches when included with RHEL 8.4z

Comment 35 Rashid Khan 2022-01-05 21:57:21 UTC
Increased priority and severity as per the urgency shown by VZ for this.

Comment 44 Eelco Chaudron 2022-06-08 11:59:37 UTC
Ping? Is this item solved, or are we waiting for some testing? If so, any ETA?